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Abstract 

We present an analysis, under iterative decoding, of coset LDPC codes over G¥{q), designed for use over arbitrary 
ly-j . discrete-memoryless channels (particularly nonbinary and asymmetric channels). We use a random-coief analysis to 

■ produce an effect that is similar to output-symmetry with binary channels. We show that the random selection of the 

(N . nonzero elements of the GF(g) parity-check matrix induces a permutation-invariance property on the densities of 

Q I the decoder messages, which simplifies their analysis and approximation. We generalize several properties, including 

I symmetry and stability from the analysis of binary LDPC codes. We show that under a Gaussian approximation. 

On . the entire q — 1 dimensional distribution of the vector messages is described by a single scalar parameter (like the 

distributions of binary LDPC messages). We apply this property to develop EXIT charts for our codes. We use 
appropriately designed signal constellations to obtain substantial shaping gains. Simulation results indicate that our 
\ codes outperform multilevel codes at short block lengths. We also present simulation results for the AWGN channel, 

including results within 0.56 dB of the unconstrained Shannon limit (i.e. not restricted to any signal constellation) 
at a spectral efficiency of 6 bits/s/Hz. 



Index Terms 

Bandwidth efficient coding, coset codes, iterative decoding, low-density parity-check (LDPC) codes. 
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c/3 . L Introduction 

O 

In their seminal work, Richardson et al. [29], [28] developed an extensive analysis of LDPC codes over 
• j—y ■ 

^ ' memoryless binary-input output-symmetric (MBIOS) channels. Using this analysis, they designed edge-distributions 
. 5t 1 for LDPC codes at rates remarkably close to the capacity of several such channels. However, their analysis is mostly 

restricted to MBIOS channels. This rules out many important channels, including bandwidth-efficient channels, 

which require nonbinary channel alphabets. 

To design nonbinary codes, Hou et al. [18] suggested starting off with binary LDPC codes either as components 

of a multilevel code (MLC) or a bit-interleaved coded modulation (BICM) scheme. Nonbinary channels are typically 

To appear, IEEE Trans. Inf. Theory, (submitted October 2004, revised and accepted for publication, November 2005). The authors 
are with the School of Electrical Engineering, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel (e-mail: abn@eng.tau.ac.il; 
burstyn@eng.tau.ac.il). This research was supported by the Israel Science Foundation, grant no. 22/01-1, by an equipment grant from 
the Israel Science Foundation to the school of Computer Science at Tel Aviv University and by a fellowship from The Yitzhak and Chaya 
Weinstein Research Institute for Signal Processing at Tel Aviv University. The material in this paper was presented in part at the 41st Annual 
Allerton Conference on Communications, Control and Computing, Monticello, Illinois, October 2003 and the 2005 IEEE International 
Symposium on Information Theory, Adelaide, Australia. 



1 



not output-symmetric, thus posing a problem to their analysis. To overcome this problem, Hou et al. used coset 
LDPC codes rather than plain LDPC codes. The use of coset-LDPC codes was first suggested by Kavcic et al. [19] 
in the context of LDPC codes for channels with intersymbol interference (ISl). 

MLC and BICM codes are frequently decoded using multistage and parallel decoding, respectively. Both methods 
are suboptimal in comparison to methods that rely only on belief-propagation decoding'. Full belief-propagation 
decoding was considered by Vamica et al. [37] for MLC and by ourselves in [1] (using a variant of BICM LDPC 
called BQC-LDPC). However, both methods involve computations that are difficult to analyze. 

An alternative approach to designing nonbinary codes starts off with nonbinary LDPC codes. Gallager [16] 
defined arbitrary-alphabet LDPC codes using modulo-g arithmetic. Nonbinary LDPC codes were also considered 
by Davey and MacKay [10] in the context of codes for binary-input channels. Their definition uses Galois field 
(GF(q')) arithmetic. In this paper we focus on GF(q) LDPC codes similar to those suggested in [10]. 

In [1] we considered coset GF(q) LDPC codes under maximum-likelihood (ML) decoding. We showed that 
appropriately designed coset GF(g) LDPC codes are capable of achieving the capacity of any discrete-memoryless 
channel. In this paper, we examine coset GF{q) LDPC codes under iterative decoding. 

A straightforward implementation of the nonbinary belief-propagation decoder has a very large decoding 
complexity. However, we discuss an implementation method suggested by Richardson and Urbanke [28] [Section V] 
that uses the multidimensional discrete Fourier transform (DFT). Coupled with an efficient algorithm for computing 
of the multidimensional DFT, this method reduces the complexity dramatically, to that of the above discussed 
binary-based MLC and BICM schemes (when full belief-propagation decoding is used). 

With binary LDPC codes, the BPSK signals ±1 are typically used instead of the {0, 1} symbols of the code 
alphabet, when transmitting over the AWGN channel. Similarly, with nonbinary LDPC codes, a straightforward 
choice would be to use a PAM or QAM signal constellation (which we indeed use in some of our simulations). 
However, with such constellations, the codes exhibit a shaping loss which, at high SNR, approaches 1.53 dB [13]. 
By carefully selecting the signal constellation, a substantial shaping gain can be achieved. Two approaches that we 
discuss are quantization mapping, which we have used in [1] (based on ideas by Gallager [17] and McEliece [25]) 
and nonuniform spacing (based on Sun and van Tilborg [33] and Fragouli et al. [14]). 

An important aid in the analysis of binary LDPC codes is density evolution, proposed by Richardson and 
Urbanke [28]. Density evolution enables computing the exact threshold of binary LDPC codes asymptotically 
at large block lengths. Using density evolution, Chung et al. [8] were able to present irregular LDPC codes 
within 0.0045 dB of the Shannon Umit of the binary-input AWGN channel. Efficient algorithms for computing 
density-evolution were proposed in [28] and [8]. 

Density evolution is heavily reliant on the output-symmetry of typical binary channels. In this paper, we show 

that focusing on coset-codes enables extension of the concepts of density-evolution to nonbinary LDPC codes. 

We examine our codes in a random coset setting, where the average performance is evaluated over all possible 

'Multistage decoding involves transferring a hard decision on the decoded codeword (rather than a soft decision) from one component 
code to the next. It further does not benefit from feedback on this decision from subsequent decoders. Parallel decoding of BICM codes is 
bounded away from capacity as discussed in [7]. 
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realizations of the coset vector. Our approach is similar to the one used by Kavcic et al. [19] for binary channels 
with ISI. Random-coset analysis enables us to generalize several properties from the analysis of binary LDPC, 
including the all-zero codeword assumption^, and the symmetry property of densities. 

In [9] and [35], approximations of the density-evolution were proposed that use a Gaussian assumption. These 
approximations track one-dimensional surrogates rather that the true densities, and are easier to implement. A 
different approach was used in [6] to develop one-dimensional surrogates that can be used to compute lower- 
bounds on the decoding threshold. 

Unlike binary LDPC codes, the problem of finding an efficient algorithm for computing density evolution for 
nonbinary LDPC codes remains open. This is a result of the fact that the messages transferred in nonbinary 
belief-propagation are multidimensional vectors rather than scalar values. Just storing the density of a non-scalar 
random variable requires an amount of memory that is exponential in the alphabet size. Nevertheless, we show that 
approximation using surrogates is very much possible. 

With LDPC codes over GF{q), the nonzero elements of the sparse parity-check matrix are selected at random 
from GF(q)\{0}. In this paper, we show that this random selection induces an additional symmetry property on the 
distributions tracked by density-evolution, which we call permutation-invariance. We use permutation-invariance to 
generalize the stability property from binary LDPC codes. 

Gaussian approximation of nonbinary LDPC was first considered by Li et al. [22] in the context of transmission 
over binary-input channels. Their approximation uses g — 1 dimensional vector parameters to characterize the 
densities of messages, under the assumption that the densities are approximately Gaussian. We show that assuming 
permutation-invariance, the densities may in fact be described by scalar, one-dimensional parameters, like the 
densities of binary LDPC. 

Finally, binary LDPC codes are commonly designed using EXIT charts, as suggested by ten Brink et al. [35]. 
EXIT charts are based on the Gaussian approximation of density-evolution. In this paper, we therefore use the 
generalization of this approximation to extend EXIT charts to coset G¥{q) LDPC codes. Using EXIT charts, we 
design codes at several spectral efficiencies, including codes at a spectral efficiency of 6 bits/s/Hz within 0.56 dB of 
the unconstrained Shannon limit (i.e., when transmission is not restricted to any signal constellation). To the best of 
our knowledge, these are the best codes designed for this spectral efficiency. We also compare coset GF(g) LDPC 
codes to codes constructed using multilevel coding and turbo-TCM, and provide simulation results that indicate 
that our codes outperform these schemes at short block-lengths. 

Our work is organized as follows: We begin by introducing some notation in Section |n]^. In Section HiH we 
formally define coset LDPC codes over GF(g) and ensembles of codes, and discuss mappings to the channel 
alphabet. In Section|W]we present belief-propagation decoding of coset GF(g) LDPC codes, and discuss its efficient 
implementation. In Section |V] we discuss the all-zero codeword assumption, symmetry and channel equivalence. 
In Section 1^ we present density evolution for nonbinary LDPC and permutation-invariance. We also develop the 

^Note that in [38], an approach to generalizing density evolution to asymmetric binary channels was proposed that does not require the 
all-zero codeword assumption. 

^We have placed this section first for easy reference, although none of the notations are required to understand Section HiH 
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Stability property and Gaussian approximation. In Section IVIII we discuss the design of LDPC codes using EXIT 
charts and present simulation results. In Section fVIIII we compare our codes with multilevel coding and turbo-TCM. 
Section |IX] presents ideas for further research and concludes the paper. 

II. Notation 

A. General Notation 

Vectors are typically denoted by boldface e.g. x. Random variables are denoted by upper-case letters, e.g. X and 
their instantiations in lower-case, e.g. x. We allow an exception to this rule with random variables over GF(g), to 
enable neater notation. 

For simplicity, throughout this paper, we generally assume discrete random variables (with one exception involving 
Gaussian approximation). The generalization to continuous variables is immediate. 

B. Probability and LLR Vectors 

An important difference between nonbinary and binary LDPC decoders is that the former use messages that 
are multidimensional vectors, rather than scalar values. Like the binary decoders, however, there are two possible 
representations for the messages: plain-likelihood probability-vectors or log-likelihood-ratio (LLR) vectors. 

A g-dimensional probability- vector is a vector x = (xq, of real numbers such that > for all i and 

X^fZo Xj = 1. The indices i = 0, ...,q — 1 of each message vector's components are also interpreted as elements 
of G¥{q). That is, each index i is taken to mean the ith element of GF(g), given some enumeration of the field 
elements (we assume that indices and 1 correspond to the zero and one elements of the field, respectively). 

Given a probability- vector x, the LLR values associated with it are defined as Wi = \og{xQ/ xi), i = 0, — 1 
(a definition borrowed from [22]). 

Notice that for all x, wq = 0. We define the LLR-vector representation of x as the q — 1 dimensional vector 
w = (toi, Wq-i). For convenience, although is not defined as belonging to this vector, we will allow ourselves 
to refer to it with the implicit understanding that it is always equal to zero. 

Given an LLR vector w, the components of the corresponding probability-vector (the probability vector from 
which w was produced) can be obtained by 

3;, = LLR-Hw)= , i = Q,...,q-l (1) 

1 + ELi e""^" 

We use the shorthand notation x' to denote the LLR-vector representation of a probability-vector x. Similarly, 
if is an LLR-vector, then W is its corresponding probability-vector representation. 

A probability- vector random variable is defined to be a q-dimensional random variable X = (Xq, Xq^i), that 
takes only valid probability-vector values. An LLR-vector random variable is a g — 1-dimensional random variable 
W = (i^i,...,VF,_i). 
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C. The Operations xg and +g 

Given a probability vector x and an element g € GF(g), we define the +g operator in the following way (note 
that a different definition will shortly be given for LLR vectors) 

X+^' = (2) 

where addition is performed over GF{q). x* is defined as the set 

x*^{x,x+\...,x+(''-i)} (3) 

We define n(x) as the number of elements g G GF((7) satisfying x"^^ = x. For example, assuming GF(3) addition, 
n([l,0,0]) = 1, and n([l/3, 1/3, 1/3]) = 3. Note that n(x) > 1 for all x, because x+° = x. 
Similarly, we define 

Note that the operation +g is reversible, and (x"''^)^^' = x. Similarly, x^i is reversible for all g ^ 0, and (x^^) ^ = 
X. In Appendix H] we summarize some additional properties of these operators that are used in this paper. 

In the context of LLR vectors, we define the operation +g differently. Given an LLR vector w, we define w+9 
using the corresponding probability vector. That is, w+S' = LLR([LLR^i(w)]+9). Thus we obtain: 

wf^ = Wi+g-Wg, i = l,...,q-l (5) 

The operation xgi is similarly defined as w^S' = LLR([LLR^^(w)]^5). However, unlike the +g operation, the 
resulting definition coincides with the definition for probability vectors, and 

=Wi-g, t = l,...,q-l 

III. CosET GF(g) LDPC CODES Defined 

We begin in Section IIII-AI by defining LDPC codes over GF{q). We proceed in Section IIII-BI to define coset 
GF(g) LDPC codes. In Section IIII-CI we define the concept of mappings, by which coset GF{q) LDPC codes are 
tailored to specific channels. In Section Ilil-DI we discuss ensembles of coset GF{q) LDPC codes. 

A. LDPC Codes over GF(q) 

A GF(g) LDPC code is defined in a way similar to binary LDPC codes, using a bipartite Tanner graph [34]. The 
graph has N variable (left) nodes, corresponding to codeword symbols, and M check (right) nodes corresponding 
to parity-checks. 

Two important differences distinguish GF(q) LDPC codes from their binary counterparts. Firstly, the codeword 
elements are selected from the entire field GF(q). Hence, each variable-node is assigned a symbol from GF{q), 
rather than just a binary digit. Secondly, at each edge of the Tanner graph, a label gij S GF((7)\{0} is 

defined. Figure ^illustrates the labels at the edges adjacent to some check node of an LDPC code's bipartite graph 
(the digits 1, 2 and 5 represent nonzero elements of GF(g)). 
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Fig. 1. Schematic diagram of a GF(g) LDPC bipartite graph. 

A word c with components from GF{q) is a codeword if at each check-node j, the following equation holds: 

H 9i,jCi = 

where M{j) is the set of variable nodes adjacent to j. The GF(g)-LDPC code's parity-check matrix can easily be 
obtained from its bipartite graph (see [1]). 

As with binary LDPC codes, we say that a GF(q) LDPC code is regular if all variable-nodes in its Tanner graph 
have the same degree, and all check-nodes have the same degree. Otherwise, we say it is irregular. 

B. Coset GF(q) LDPC Codes 

As mentioned in Section |l] rather than use plain GF(g) LDPC codes, it is useful instead to consider coset codes. 
In doing so, we follow the example of Elias [12] with binary codes. 

Definition 1: Given a length N linear code C and a length N vector v over GF{q), the code {c + v : c G C} 
(i.e. obtained by adding v to each of the codewords of C) is called a coset code. Note that the addition is performed 
componentwise over GF{q). v is called the coset vector. 

The use of coset codes, as we will later see, is a valuable asset to rigorous analysis and is easily accounted for in 
the decoding process. 

C. Mapping to the Channel Signal Set 

With binary LDPC codes, the BPSK signals ±1 are typically used instead of the {0, 1} symbols of the code 
alphabet. With nonbinary LDPC, we denote the signal constellation by A and the mapping from the code alphabet 
(GF(g)) by 6{-). When designing codes for transmission over an AWGN channel, a pulse amplitude modulation 
(PAM) or quadrature amplitude modulation (QAM) constellation is a straightforward choice for A. In Section IVIIII 
we present codes where ^ is a PAM signal constellation. However, we now show that more careful attention to 
the design of the signal constellation can produce a substantial gain in performance. 

In [1] we have shown that ensembles of GF(g)-LDPC codes resemble uniform random-coding ensembles. That 
is, the empirical distribution of GF{q) symbols in nearly all codewords is approximately uniform. Equivalently, 
for a given codeword c, Pr[c = 5] ~ | V^i G GF((7), where c is a randomly selected codeword symbol. Such 
codes are useful for transmission over symmetric channels, where the capacity-achieving distribution is uniform 



Fig. 2. An example of quantization mapping. 



[17]. However, to approach capacity over asymmetric channels (and overcome the shaping gap [13]), we need the 
symbol distribution to be nonuniform. For example, to approach capacity over the AWGN channel, we need the 
distribution to resemble a Gaussian distribution. 

One solution to this problem is a variant of an idea by Gallager [17]. The approach begins with a mapping of 
symbols from GF(q) (the code alphabet) into the channel input alphabet. We typically use a code alphabet that is 
larger than the channel input alphabet. By mapping several GF{q) symbols into each channel symbol (rather than 
using a one-to-one mapping), we can control the probability of each channel symbol. For example, in Fig. |2l we 
examine a channel alphabet A = {a,b,c}, and a quantization mapping that is designed to achieve the distribution 
Q{a) = Q{b) = 3/8, (5(c) = 1/4 (The digits 0,...,7 represent elements of GF(8)). We call this a quantizaion 
mapping because the mapping is many-to-one. 

Formally, we define quantization mapping as follows: 

Definition 2: Let Q{-) be a rational probability assignment of the form Q{a) = Na/q, for all a £ A. A 
quantization 5{-) = 6q{-) associated with Q{a) is a mapping from a set of GF(q) elements to A such that the 
number of elements mapped to each a £ A is, q ■ Q{a). 

Quantizations are designed for finite channel input alphabets and rational-valued probability assignments. 
However, other probability assignments can be approximated arbitrarily close. Independently of our work, a similar 
approach was developed by Ratzer and MacKay [26] (note that their approach does not involve coset codes). 

A similar approach to designing mappings is based on Sun and van Tilborg [33] and Fragouli et al. [14] and 
is suitable for channels with continuous-input alphabets (like the AWGN channel). Instead of mapping many code 
symbols into each channel symbol, they used a one-to-one mapping to a set A of channel input signals that are 
non-uniformly spaced. To approximate a Gaussian input distribution, for example, the signals could be spaced more 
densely around zero. 

Given a mapping 5{-) over GF(g), we define the mapping of a vector v with symbols in G¥{q), as the vector 
obtained by applying 5{-) to each of its symbols. The mapping of a code is the code obtained by applying the 
mapping to each of the codewords. 
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Fig. 3. Encoding of coset GF(g) LDPC codes 



It is useful to model coset GF(q) LDPC encoding as a sequence of operations, as shown in Figure|3] An incoming 
message is encoded into a codeword of the underlying GF(q) LDPC code C. The coset vector v is then added, and 
a mapping 6{-) is applied. In the sequel, we will refer to the resulting codeword as a coset GF{q) LDPC codeword, 
although strictly speaking, the mapping 6{-) is not included in Definition ^ Finally, the resulting codeword is 
transmitted over the channel. 



D. (X,p,6) Ensembles of Coset GF(q) LDPC Codes 

As in the case of standard, binary LDPC codes, the analysis of coset GF(q) LDPC focuses on the average 
behavior of codes selected at random from an ensemble of codes. 

The following method, due to Luby et al. [24] is used to construct irregular bipartite Tanner graphs. The graphs 
are characterized by two probability vectors, 

A = (Ai,...,Ac) p= {pi,...,pd) 

For convenience we also define the polynomials A(x) = J2i=2 ^nd p{x) = J2j=2 PjX^~^- 

In a (A, p) Tanner graph, for each i a fraction Aj of the edges has left degree i, and for each j a fraction pj of 
the edges has right degree j. Letting E denote the total number of edges, we obtain that there are XiE/i left-nodes 
with degree i, and pjE/j right-nodes with degree j. Letting N denote the number of left-nodes and M denotes 
the number of right-nodes, we have 

N = EJ2- M = EJ2- 

i=i ^ j=i ^ 

Luby et al. suggested the following method for constructing (A, p) bipartite graphs. The E edges originating from 
left nodes are numbered from 1 to E. The same procedure is applied to the E edges originating from right nodes. A 
permutation vr is then chosen with uniform probability from the space of all permutations of {1, 2, ... , E}. Finally, 
for each i, the edge numbered i on the left side is associated with the edge numbered vTj on the right side. Note 
that occasionally, multiple edges may link a pair of nodes. 

A (A, p) GF(g') LDPC code is constructed from a (A, p) Tanner graph by random i.i.d. selection of the labels 
with uniform probability from GF(g)\{0}, at each edge. Given a mapping 6{-), a {X,p,6) coset GF{q) LDPC code 
is created by applying 6{-) to a coset of a {X, p) GF(g)-LDPC code. The coset vector v is generated by random 
uniform i.i.d selection of its components from GF(q). 

Summarizing, a random selection of a code from a (A, p, 5) coset GF{q) LDPC ensemble amounts to a random 
construction of its Tanner graph, a random selection of its labels and a random selection of a coset vector. 
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The rate of a (A, p, 6) coset GF{q) LDPC code is equal to the rate of its underlying GF(g) LDPC code. The 
design rate R of a {\, p) GF{q) LDPC code is defined as 

This value is a lower bound on the true rate of the code, measured in g-ary symbols per channel use. 

IV. Belief-Propagation Decoding of Coset GF{q) LDPC Codes 
A. Definition of the Decoder 

The coset GF{q) LDPC belief -propagation decoder is based on Gallager [16] and Kschischang et al. [21]. The 
decoder attempts to recover c, the codeword of the underlying GF{q) LDPC code. Decoding consists of alternating 
rightbound and leftbound iterations. In a rightbound iteration, messages are sent from variable-nodes to check-nodes. 
In a leftbound iteration, the opposite occurs. Note that with this terminology, a rightbound message is produced at 
a left node (a variable-node) and a leftbound message is produced at a right node (a check-node). 

As mentioned in Section |nl the decoder's messages are q dimensional probability vectors, rather than scalar 
values as in standard binary LDPC. 

Algorithm 1: Perform the following steps, alternately: 

1) Rightbound iteration. For all edges e = do the following in parallel: 

If this is iteration zero, set the rightbound message r = r{i,j) to the initial message r^^^ = r^^\i), whose 
components are defined as follows: 

^(0) ^ Pijyj I d{k + vi)] 

Ui and Vi are the channel output and the element of the coset vector v corresponding to variable node i. The 
addition operation k + Vi is performed over GF{q). 
Otherwise (iteration number 1 and above), 

2^k'=0 ' k' lln=l 'A:' 

where di is the degree of the node i and l^^^, l^"^*"^) denote the incoming (leftbound) messages across the 
edges : j' € J\f{i) \ j}, M{i) denoting the set of nodes adjacent to i. 

2) Leftbound iteration. For all edges e = do the following in parallel: 
Set the components of the leftbound message 1 = l(j, i) as follows: 

h= E U-t^ (9) 

where dj is the degree of node j, and r^^), r'^'^J^^) denote the rightbound messages across the edges 
: i' G M{j)\i} and gi, ...,gd^-i aie the labels on those edges, gd- denotes the label on the edge (i, j). 
The summations and multiplications of the indices a„ and the labels gn are performed over GF{q). Note that 
an equivalent, simpler expression will be given shortly. 
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If X is a rightbound (leftbound) message from (to) a variable-node, then element represents an estimate of 
the a-posteriori probability (APP) that the corresponding code symbol is k, given the channel observations in a 
corresponding neighborhood graph (we will elaborate on this in Section lIV-Cl l. The decision associated with x 
is defined as follows: the decoder decides on the symbol k that maximizes Xk- If the maximum was obtained at 
several indices, a uniform random selection is made among them. 

In our analysis, we focus on the probability that a rightbound or leftbound message is erroneous (i.e., corresponds 
to an incorrect decision). However, in a practical setting, the decoder stops after a fixed number of decoding iterations 
and computes, at each variable-node i, a final vector r(i) of APP values. The vector is computed using (Jsjl, replacing 
M{i)\j with J\f{i). r{i) is unique to each variable-node (unlike rightbound or leftbound messages), and can thus 
be used to compute a final decision on its value. 

Consider expression ^ for computing the leftbound messages. A useful, equivalent expression is given by, 

x(-9<ij) 

(10) 



1 



'dj-1 



r(") 



n=l 

where 1 is the entire leftbound vector (rather than a component as in ^) and the x operator is defined as in Q. 
The GF{q) convolution operator is defined as an operation between two vectors, which produces a vector whose 
components are given by, 

x«0x(2)]^= 4'^-4-a> keGF{q) (11) 

aGGF(g) 

where the subtraction k — a is evaluated over GF{q). Throughout the paper, the following definitions are useful: 

iAix(-,-)^ (rW)"'"\ n = l,...,d,-l (12) 

Using these definitions, dTOt may be further rewritten as, 

d,-l 

1 = Q f (13) 

71=1 

Like the standard binary LDPC belief -propagation decoder, the coset GF(q) LDPC decoder also has an equivalent 
formulation using LLR messages. 

Algorithm 2: Perform the following steps, alternately: 

1) Rightbound iteration. For all edges e = do the following in parallel: 

If this is iteration zero, set the LLR rightbound message r' = r'{i,j) to r''^^-' = r'^^\i), whose components 
are defined as follows: 

= log pf'f (14) 



Otherwise (iteration number 1 and above), 

r' = + 'l^'l'^") (15) 



71=1 



where di is the degree of the node i and l'*-^\ l'^'^' denote the incoming (leftbound) LLR messages 
across the edges : j' G ^f{i) \ j}- Addition between vectors is performed componentwise. 
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2) Leftbound iteration. All rightbound messages are converted from LLR to plain-likelihood representation. 
Expression Q is applied to obtain the plain-likelihood representation of the leftbound messages. Finally, the 
leftbound messages are converted back to their corresponding LLR representation. 
Both versions of the decoder have similar execution times. However, the LLR representation is sometimes useful 
in the analysis of the decoders' performance. Note that Wymeersch et al. [39] have developed an alternative 
decoder that uses LLR representation, which does not require the conversion to plain-likelihood representation that 
is used in the leftbound iteration of the above algorithm. 



B. Efficient Implementation 

To compute rightbound messages, we can save time by computing the numerators separately, and then normalizing 
the sum to 1. At a variable node of degree dj, the computation of each rightbound message takes 0{q ■ di) 
computations. 

A straightforward computation of the leftbound messages at a check-node of degree dj has a complexity of 
0{djq'^^~^) per leftbound-message, and a total of 0{d'jq'^^^^) for all messages combined. We will now review 
a method due to Richardson and Urbanke [28] (developed for the decoding of standard GF{q) LDPC codes) 
that significantly reduces this complexity. This method assumes plain-likelihood representation of messages. It 
is nonetheless relevant to the implementation of Algorithm |2l which uses LLR representation, because with this 
algorithm the leftbound messages are computed by converting them to plain-likelihood representation, applying ^ 
and converting back to LLR representation. 

We first recount some properties of Galois fields (see e.g. [5] for a more extensive discussion). Galois fields 
GF{q) exist for values of q equal to p™, where p is a prime number and m is a positive integer. Each element 
of GF(p'") can be represented as an m-dimensional vector over 0, ...,p — 1. The sum (difference) of two GF(p'") 
elements corresponds to the sum (difference) of the vectors, evaluated as the modulo-p sums (differences) of the 
vectors' components. 

Consider the GF{q) convolution operator, defined by dTTTl and used in the process of computing the leftbound 
message in (fTOl i. We now replace the GF{q) indices a and k in il U with their vector representations, a,K € 
{0, ...,p — 1}*". The expression can be rewritten as 



x(l) x(2) 



E ^«-4-amodp. KG{0,...,p-ir (16) 

a!e{o,...,p-i}'" 



Consider, for example, the simple case of m = 2. (UTT i becomes 

1 p— 1 



x(l) x(2) 



E E ^"lU ■ 4?-ax modpA2-«2 modp' (^1' '^s) G {0, ...,p - 1}^ (17) 

Ol=0 02=0 



The right hand side of (fTTt is the output of the two-dimensional cyclic convolution of x^^) and x^^), evaluated at 
(ki,K2). In the general case we have the m-dimensional cyclic convolution. This convolution can equivalently be 
evaluated using the m-dimensional DFT (m-DFT) and IDFT (m-IDFT) [ll][page 71]. Thus, (fT3l can be rewritten 
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as 

(d,-l 

1 = IDFT Y[ DFT(f(")) 

\n=l 

Where the multiplication of the DFT vectors is performed componentwise (1 can be evaluated from 1 by 1 = 

Let d denote the DFT vector of a g-dimensional probability vector x. The components of d and x are related by 

the equations [11] [page 65] 

ai,...,a^e{0,...,p-l} 

^a.,...,a. = - E cZ^.,...,/3.e-^TEr.."*-A (^.JDFT) 

^ /3i,..,/3„6{0,...,p-l} 

Efficient computation of the m-DFT is possible by successively applying the single-dimensional DFT on each of 
the dimensions in turn, as shown in the following algorithm [11] [page 76]: 

Algorithm 3: 

for i = 1 to m 

for each vector (ai, a^.i, a^+i, a^) G {0, - l}"'"^ 

{dai,...,ai-i,ai,ai+i,...,arn}^-=0 ^ ^-^^^{Xai,...,ai-i,ai,ai+i,...,am}^i=0 

end 

if i / m then x ^ d 
end 

return d 

At each iteration of the above algorithm, p^-^ 1-DFTs are computed. Each 1-DFT requires floating-point 
multiplications and p ■ {p — 1) floating-point additions (to compute all components), and thus the entire algorithm 
requires m ■ = m ■ p ■ q multipUcations and m ■ {p — \)p^ = m ■ {p — 1) ■ q additions. The m-IDFT can be 
computed in a similar maimer. Note that a further reduction in complexity could be obtained by using number- 
theoretic transforms, such as the Winograd FFT. 

We can use these results to reduce the complexity of leftbound computation at each check-node, by first computing 
the m-DFTs of all rightbound messages, then using the DFT vectors to compute convolutions. The resulting 
complexity at each check-node is now 0{dj • mpq + dj{dj — 1) • q). The first element of the sum is the computation 
of m-DFTs and m-IDFTs, the second is the multiplications of m-DFTs for all messages. This is a significant 
improvement in comparison to the straightforward approach. 

Note that the m-DFT is particularly attractive when p = 2, i.e., when q is 2"^. The elements of the form 
g!>T Si=i become (— 1)^*=! Thus, the floating-point multiplications are ehminated, and the DFT involves 
only additions and subtractions. The above complexity figure per check-node thus becomes 0{dj-mq+dj{dj — l)-q). 
Furthermore, all quantities are real-valued and no complex-valued arithmetic is needed. 

An additional improvement, to an order of 0{dj ■ mpq + 3-dj -q) (in the general case where p is not necessarily 
2) can be achieved using a method suggested by Davey and MacKay [10]. This method produces a negligible 
improvement except at very high values of dj, and is therefore not elaborated here. 
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C. Neighborhood Graphs and the Tree Assumption 

Before we conclude this section, we briefly review the concepts neighborhood graphs and the tree assumption. 
These concepts were developed in the context of standard binary LDPC codes and carry over to coset G¥{q) LDPC 
codes as well. 

Definition 3: (Richardson and Urbanke [28]) The neighborhood graph of depth d, spanned from an edge e, is 
the induced graph containing e and all edges and nodes on directed paths of length d that end with e. 
At iteration t, a rightbound message produced from a variable-node i to a check node j is a vector of APP values 
for the code symbol at i, given information observed in the neighborhood of e = of depth 2t. Similarly, a 
leftbound message from j to i is based on the information observed in the neighborhood of e = (j, i), of depth 
2t-l. 

The APP values produced by belief-propagation decoders are computed under the tree assumption'^. We say that 
the tree assumption is satisfied at a node n in the context of computing a message x, if the neighborhood graph 
on which the message is based is a tree. Asymptotically, at large block lengths A^, the tree assumption is satisfied 
with high probability at any particular node [28]. 

At finite block lengths, the neighborhood graph frequently contains cycles and is therefore not a tree. Such cases 
are discussed in Appendix |n] Nevertheless, simulation results indicate that the belief-propagation decoder produces 
remarkable performance even when the tree assumption is not strictly satisfied. 

V. COSET GF(q) LDPC ANALYSIS IN A RANDOM-COSET SETTING 

One important aid in the analysis of coset GF{q) LDPC codes is the randomly selected coset vector that was 
used in their construction. Rather than examine the decoder of a single coset GF{q) LDPC code, we focus on a 
set of codes. That is, given a fixed GF(g)-LDPC code C and a mapping 6{-), we consider the behavior of a coset 
GF(g) LDPC code constructed using a randomly selected coset vector v. We refer to this as random-coset analysis. 

With this approach, the random space consists of random channel transitions as well as random realizations of 
the coset vector v. The random coset vector produces an effect that is similar to output-symmetry that is usually 
required in the analysis of standard LDPC codes [28], [29]. Note that although v is random, it is assumed to have 
been selected in advance and is thus known to the decoder. 

Unlike the coset vector, in this section we keep the underlying GF{q) LDPC code fixed. In Section IVII we 
will consider several of these concepts in the context of selecting the underlying LDPC code at random from an 
ensemble. 

A. The All-Zero Codeword Assumption 

An important property of standard binary LDPC decoders [28] is that the probability of decoding error is equal 
for any transmitted codeword. This property is central to many analysis methods, and enables conditioning the 
analysis on the assumption that the all-zero^ codeword was transmitted. 

''in [28] it is called the independence assumption. 

^In [28] a BPSK alphabet is used and thus the codeword is referred to as the "all-one" codeword. 
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With coset GF{q) LDPC codes, we have the following lemma. 

Lemma 1: Assume a discrete memoryless channel. Consider the analysis, in a random-coset setting, of a coset 
GF(g) LDPC code constructed from a fixed GF(g)-LDPC code C. For each c G C, let Pe{c) denote the conditional 
(bit or block) probability of decoding error after iteration t, assuming the codeword 6{c + v) was sent, averaged 
over all possible values of the coset vector v. Then Pe(c) is independent of c. 
The proof of the lemma is provided in Appendix IIII-BI 

Lemmanenables us to condition our analysis results on the assumption that the transmitted codeword corresponds 
to of the underlying LDPC code. 

B. Symmetry of Message Distributions 

The symmetry property, introduced by Richardson and Urbanke [29] is a major tool in the analysis of standard 
binary LDPC codes. In this section we generalize its definition to g-ary random variables as used in the analysis of 
coset GF{q) LDPC decoders. We provide two versions of the definition, the first using probability-vector random 
variables and the second using LLR-vector random variables. 

Definition 4: A probability-vector random variable X is symmetric if for any probability-vector x, the following 
expression holds: 

Pr[X = x|XGx*] =xo-n(x) (18) 

where x* and n(x) are as defined in Section H] 

In the context of LLR-vector random variables, we have the following lemma. 

Lemma 2: Let W be an LLR-vector random variable. The random variable X = W' = LLR^^(W) is symmetric 
if and only if W satisfies 

Pr[W = w]= e""' Pr[W = w+^] (19) 
for all LLR- vectors w and all i G GF(g). 

The proof of this lemma is provided in Appendix IIII-CI In the sequel, we adopt the lemma as a definition of 
symmetry when discussing variables in LLR representation. Note that in the simple case of g = 2, the LLR vector 
degenerates to a scalar value and from ^ we have w^^ = —w. Thus, ( fT9t becomes 

Vy[W = w]=e'" ¥t[W = -w] (20) 

This coincides with symmetry for binary codes as defined in [29]. 
We now examine the message produced at a node n. 

Theorem 1: Assume a discrete memoryless channel and consider a coset G¥{q) LDPC code constructed in a 
random-coset setting from a fixed GF(g)-LDPC code C. Let X denote the message produced at a node n of the 
Tanner graph of C (and of the coset GF(g) LDPC code), at some iteration of belief-propagation decoding. Let 
the tree assumption be satisfied at n. Then under the all-zero codeword assumption, the random variable X is 
symmetric. 

The proof of the theorem is provided in Appendix IIII-DI 
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Equivalent channel 




Fig. 4. Equivalent channel model for coset GF(g) LDPC codes. 
C. Channel Equivalence 

Simple GF(g)-LDPC codes, although unsuitable for arbitrary channels, are simpler to analyze than coset GV{q) 
LDPC codes and decoders. Fig. @]presents the structure of coset GF(q') LDPC encoding/decoding, x is the transmitted 
symbol (of the underlying code) and v is the coset symbol, u = x + v (evaluated over GF(g)) is the input to the 
mapper, x' = 5{u) is the mapper's output and y' is the physical channel's output, y will be discussed shortly. 

Comparing a coset G¥{q) LDPC decoder with the decoder of its underlying G¥{q) LDPC code we may observe 
that a difference exists only in the computation Q of the initial messages y^^\ The messages r^^^ are APP values 
corresponding to a single channel observation. After they are computed, both decoders proceed in exactly the same 
way. It would thus be desirable to abstract the operations that are unique to coset GV{q) LDPC codes into the 
channel, and examine an equivalent model, which employs simple GF(g)-LDPC codes and decoders. 

Consider the channel obtained by encapsulating the addition of a random coset symbol, the mapping and the 
computation of the APP values into the channel model. The input to the channel is a symbol x from the code 
alphabet^ and the output is a probability vector y = r^''^ of APP values. The decoder of a G¥{q) LDPC code, if 
presented with y as raw channel output, would first compute a new vector of APP values. We will soon show that 
the computed vector would in fact be identical to y. 

We begin with the following definition: 

Definition 5: Let Pr[y | x] denote the transition probabilities of a channel whose input alphabet is GF(g) and 
whose output alphabet consists of g-dimensional probability vectors. Then the channel is cyclic -symmetric if there 
exists a probability function Q{y*) (defined over sets of probability vectors such that 

Pr[Y = y\x = i]=yi- n(y) • Q(y*) (21) 
Lemma 3: Assume a cyclic-symmetric channel. Let APP(y) denote the APP values for the channel output y. 
Then APP(y) = y. 

The proof of this lemma is provided in Appendix IIII-FI Returning to the context of our equivalent model, we have 
the following lemma, 

*In most cases of interest, x will be a symbol from a GF(g) LDPC codeword. However, in this section we also consider the general, 
theoretical case, where the input to the channel is an arbitrary GF(g) symbol. 
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Lemma 4: The equivalent channel of Fig. I^is cyclic-symmetric. 
The proof of this Lemma is provided in Appendix IIII-GI 

Once the initial messages are computed, the performance of both the coset GF(g) LDPC and GV{q) LDPC 
decoding algorithms is a function of these messages alone. Therefore, we have obtained that the performance 
of a coset GF(g) LDPC decoder in a random-coset setting over the original physical channel is identical to the 
performance of the underlying G¥{q) LDPC decoder over the equivalent channel. This result enables us to shift 
our discussion from coset GF(g') LDPC codes over arbitrary channels to G¥{q) LDPC codes over cyclic-symmetric 
channels. 

Note that a cyclic-symmetric channel is symmetric in the sense defined by Gallager [17][page 94]. Hence its 
capacity achieving distribution is uniform. This indicates that GF(g) LDPC codes, which have an approximately 
uniformly distributed code spectrum (see [1]), are suitably designed for it. 

We now relate the capacity of the equivalent channel to that of the physical channel. More precisely, we show 
that the equivalent channel's capacity is equal to the equiprobable-signalling capacity of the physical channel with 
the mapping 5{-), denoted Cs and defined below. Let U, X' and Y' be random variables corresponding to u, x' 
and y' in Fig. |3 Y' is related to X' = 5{U) through the physical channel's transition probabilities. Assume that 
U is uniformly distributed in {0, — 1}, then we define Cs by Cs = I{U]Y'). Cs is equal to the capacity 
of transmission over the physical channel with an input alphabet {5{i)}1zl using a code whose codewords were 
generated by random uniform selection. 

Lemma 5: The capacity of the equivalent channel of Fig. |4]is equal to Cs- 
The proof of this lemma is provided in Appendix IIII-HI 

Finally, the following lemma can be viewed as a generalization of the Channel Equivalence Lemma of [29]. 

Lemma 6: Let P{y) be the probability function of a symmetric probability-vector random variable. Consider 
the cyclic-symmetric channel whose transition probabilities are given by Pr[y | x = i] = i-'(y+*). Then, assuming 
that the symbol zero is transmitted over this cyclic symmetric channel, then the initial messages of a GF{q) LDPC 
decoder are distributed as P(y). 

The proof of this lemma is straightforward from Definitions 0] and |5] and from Lemma |3l We will refer to the 
cyclic-symmetric channel defined in Lemma |6l as the equivalent channel corresponding to P{y). 

Remark 1: Note that Lemma |6l remains valid if we switch to LLR representation. That is, we replace y with its 
LLR equivalent w = LLR(y) and define Pr[w \ x = i] = P(w+*) (where w+* is defined by 

VL Analysis of Density Evolution 

In this section we consider density-evolution for coset GF(g) LDPC codes and its analysis. The precise 
computation of the coset GV{q) LDPC version of the algorithm is generally not possible in practice. The algorithm 
is however valuable as a reference for analysis purposes. We begin by defining density evolution in Section IVI-AI 
and examine the application of the concentration theorem of [28] and of symmetry to it. We proceed in Section fVI-B I 
to consider permutation-invariance, which is an important property of the densities tracked by the algorithm. We 
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then apply permutation-invariance in Section IVI-CI to generalize the stability property to coset GF{q) LDPC codes 
and in Section IVI-DI to obtain an approximation of density-evolution under a Gaussian assumption. 

A. Density Evolution 

The definition of coset GF{q) LDPC density-evolution is based on that of binary LDPC codes. The description 
below is intended for completeness of this text, and focuses on the differences that are unique to coset GF{q) LDPC 
codes. The reader is referred to [28] and [29] for a complete rigorous development. 

Density evolution tracks the distributions of messages produced in belief-propagation, averaged over all possible 
neighborhood graphs on which they are based. The random space is comprised of random channel transitions, the 
random selection of the code from a (A, p, 6) coset GF{q) LDPC ensemble (see Section IIII-Dt and the random 
selection of an edge from the graph. The random space does not include the transmitted codeword, which is 
assumed to be fixed at the all-zero codeword (following the discussion of Section IV- All . We denote by R(°) the 
initial message across the edge, by the rightbound message at iteration t and by the leftbound message at 
iteration t. The neighborhood graph associated with and is always assumed to be tree-like, and the case that 
it is not so is neglected. 

We will use the above notation when discussing plain-likelihood representation of density-evolution. When 
using LLR- vector representation, we let R'^°\ R'^ and L'^ denote the LLR- vector representations of R(°\ R^ 
and Lj. To simplify our notation, we assume that all random variables are discrete- valued and thus track their 
probability-functions rather than their densities. The following discussion focuses on plain-Ukehhood representation. 
The translation to LLR representation is straightforward. 

1) The initial message. The probability function of R^''^ is computed in the following manner. 

Pr[R(°) = x] = J2 = y,V = v] 

where Y and V are random variables denoting the channel output and coset-vector components, y is the 
channel output alphabet and the components of A^\y,v) are defined by ©, replacing yi and Vi with y and 
V. The expression is equal to, 

Pr[R^°^ = x] = - ^ Pr[y was received | 6{v) was transmitted] 

^ yey,v=0,...,q-l : r<°)(j/,i')=x 

2) Leftbound messages. Lt is obtained from Q. The rightbound messages in ^ are replaced by independent 
random variables, distributed as Rt-i and assumed to be independent. Similarly, the labels in ^ are also 
replaced by independent random variables uniformly distributed in GF(g)\{0}. 

Formally, Let d be the maximal right-degree. Then for each dj = 2, ...,d we first define, 

dj dj — 1 

Pr[Lf ^) = x] = ^ n P^[Gn = 5n] • n = r^"^] 

r(i),...,r<'*J-i'e-P,5i,...,5d,eGF(g) : "=1 n=l 
l(r(i),...,r<'^.-i',gi,...,g,^.)=x 
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where V is the set of all probability vectors, and the components of l(r(^), r^^^^"^), (/i, ■.■,gdj) are defined 
as in Gn is a random variable corresponding to the nth label, and thus Pr[G„ = g] = \/{q — 1) for all 
g. Pr[Rf_i = r(")] is obtained recursively from the previous iteration of belief propagation. 
The probability function of is now obtained by, 

d 

Pr[Li = x] = ^ • Pr[Li''^) = x] 

3) Rightbound messages. The probability function of Rq is equal to that of R(^). For t > 0, Rt is obtained 
from The leftbound messages and the initial message in ^ are replaced by independent random variables, 
distributed as and R'^°\ respectively. 

Formally, let c be the maximal left-degree. Then for each di = 2, ...,c we first define, 

di-l 

¥T[^f'^ = x] = = • n = 1^"^] 

r(°),l'i',...,l<'*i-i'G7' : 
r{r(°),l(i',!..,l('*.-i))=x 

where the components of r(r(°\ 1*^*^^^) are defined as in Pr[R*^°) = r^^^] and Pr[Lt = 1*^")] are 

obtained recursively from the previous iterations of belief propagation. 
The probability function of Rt is now obtained by, 

c 

Pr[Rt = x] = 5] A,, • Pr[R;'') = x] 

di=2 

Theoretically, the above algorithm is sufficient to compute the desired densities. In practice, a major problem is 
the fact that the quantities of memory required to store the probability density of a (/-dimensional message grows 
exponentially with q. For instance, with 100 quantization^ levels per dimension, the amount of memory required for 
a 7-ary code is of the order of 100^. Hence, unless an alternative method for describing the densities is found, the 
algorithm is not realizable. It is noteworthy, however, that the algorithm can be approximated using Monte Carlo 
simulations. 

We now discuss the probability that a message examined in density-evolution is erroneous. That is, the message 
corresponds to an incorrect decision regarding the variable-node to whom it is directed or from which it was sent. 
Under the all-zero codeword assumption, the true transmitted code symbol (of the underlying LDPC code), at the 
relevant variable-node, is assumed to be zero. 

We first assume that the message is a fixed probability-vector x. Suppose xq is greater than all other elements 
Xi, i = 1, g — 1. Given the decision criterion used by the belief propagation decoder, described in Section ITV-AI 
the decoder will correctly decide zero. Similarly, if there exists an index i 7^ such that Xi > xq, then the decoder 
will incorrectly decide i. However, if the maximum is achieved at as well as A; — 1 other indices, the decoder 
wiU correctly decide zero with probability 1/k. 

Definition 6: Given a probability vector x, Pg (x) is the probability of error in a decision according to the vector 

X. 

Thus, for example Pe ([1/2, 1/4,1/4,0]) =0, Pe([l/4, 1/2, 1/4, 0]) = 1 and Pe([3/10, 3/10, 3/10, 1/10]) =2/3. 
^"Quantization" here means tiie operation performed by a discrete quantizer, not in tiie context of Definition |2| 
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Given a random variable X, we define 

Pe(X)= ^Pe(x)Pr[X = x] (22) 

X 

where the sum is over all probability vectors. 

Consider Pe(Rt). This corresponds to the probability of error at a randomly selected edge at iteration t. Richardson 
and Urbanke [28] proved a concentration theorem that states that as the block length N approaches infinity, the 
bit error rate at iteration t converges to a similarly defined probability of error. The convergence is in probability, 
exponentially in A^. Replacing bit- with symbol- error rate, this theorem carries over to coset G¥{q) LDPC density- 
evolution unchanged. 

Let Pg =Pe(Rt) be a sequence of error probabilities produced by density evolution. A desirable property of this 
sequence is given by the following theorem. 

Theorem 2: P* is nonincreasing with t. 
The proof of this theorem is similar to that of Theorem 7 of [29] and is omitted. 

Finally, in Section lV-Bl we considered symmetry in the context of the message corresponding to a fixed underlying 
GF(g) LDPC code and across a fixed edge of its Tanner graph. We now consider its relevance in the context of 
density-evolution, which assumes a random underlying LDPC code and a random edge. 

Theorem 3: The random variables R(°\ R^ and (for all t) are symmetric. 
The proof of this theorem is provided in Appendix IIV-AI 



B. Permutation-Invariance Induced by Labels 



Permutation-invariance is a key property of coset GF(g) LDPC codes that allows the approximation of their 
densities using one-dimensional functionals, thus greatly simplifying their analysis. The definition is based on the 
permutation, inferred by the operation x g, on the elements of a probability vector. 

Before we provide the definition, let us consider (fTOl) . by which a leftbound message 1 is computed in the process 
of belief propagation decoding. Let h G GF(g)\{0}, and consider P^. 

x(-9d,-/i) 
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xh 



dj~l 

o 

n=l 
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(23) 



With density evolution, the label g^- is a random variable, independent of the other labels, of the rightbound messages 

( \ d ^1 f { \ \ ^dn^ 

{R*-"^} and consequently of 0„Li ( R^"^ j ■ Similarly, g^,. • h (where h is fixed) is distributed identically with 

d —1 / { \\X9ri^ h 

gdj, and is independent of Qn=i (R^"M ■ Thus, the random variable is distributed identically with L. 
This leads us to the following definition: 

Definition 7: A probability-vector random variable X is permutation-invariant if for any fixed h G GF(g)\{0}, 
the random variable S = X^^ is distributed identically with X. 

Although this definition assumes plain-likelihood representation, it canies over straightforwardly to LLR represen- 
tation, and the following lemma is easy to verify: 
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Lemma 7: Let W be an LLR-vector random-variable and X = W = LLR (W). Then X is permutation- 
invariant if and only if, for any fixed h G GF(g)\{0}, the random variable ri = W^'' is distributed identically with 
W. 

To give an idea of why permutation-invariance is so useful, we now present two important lemmas involving 
permutation-invariant random variables. Both lemmas examine marginal random variables. The first lemma is valid 
for both probability-vector and LLR-vector representation. 

Lemma 8: Let X (W) be a probability-vector (LLR-vector) random variable. If X (W) is permutation-invariant 
then for any i,j = 1, q — 1, the random variables Xi and Xj {Wi and Wj) are identically distributed. 
The proof of this lemma is provided in Appendix IIV-BI 

Lemma 9: Let W be a symmetric LLR-vector random variable. Assume that W is also permutation-invariant. 
Then for all = 1, ...,q — 1, Wk is symmetric in the binary sense, as defined by (I20t . 

Note that this lemma does not apply to plain-likelihood representation. The proof of the lemma is provided in 
Appendix IIV-CI Consider the following definition. 

Definition 8: Given a probability-vector random variable X, we define the random-permutation of X, denoted 
X, as the random variable equal to X^^ where g is randomly selected from GF((7)\{0} with uniform probability, 
and is independent of X. 

The definition with LLR-vector representation is identical. The following lemma links permutation-invariance with 
random-permutation. 

Lemma 10: A probability-vector (LLR-vector) random-variable X (W) is permutation-invariant if and only if 
there exists a probability-vector (LLR-vector) random-variable T (S) such that X = T (W = S). 
In Appendix IIV-EI we present some additional useful lemmas that involve permutation-invariance. 

Finally, the following theorem discusses permutation-invariance 's relevance to the distributions tracked by density 
evolution. 

Theorem 4: Let R(°\ and be defined as in Section IVI-AI Then, 

1) is permutation-invariant. 

2) Let Rf = (R()^^ , where g is the label on the edge associated with the message. Then fit is symmetric, 
permutation-invariant and satisfies Pe(R.t) = -fe(R-t)- 

3) Let R.(°^ be a random-permutation of R(°). Then replacing R^^^ by R^^^ in the computation of density- 
evolution does not affect the densities of and R^. The random variable R^^^ is symmetric, permutation- 
invariant and satisfies Pe(R^°^) = i^e(R^°^)- 

The proof of this theorem is provided in Appendix II V-FI Although not all distributions involved in density-evolution 
are permutation-invariant. Theorem |3 enables us to focus our attention on permutation-invariant random variables 
alone. Our interest in the distribution of the rightbound message R^ is confined to the error probability implied by 
it. Thus we may instead examine R^. Similarly, our interest in the initial message R^^-* is confined to its effect on 
the distribution of Rt and L^. Thus we may instead examine R(°). 
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C. Stability 

The stability condition, introduced by Richardson et al. [29], is a necessary and sufficient condition for the 
probability of error to approach arbitrarily close to zero, assuming it has already dropped below some value at 
some iteration. Thus, this condition is an important aid in the design of LDPC codes with low error floors. In this 
section we generalize the stability condition to coset GF{q) LDPC codes. 

Given a discrete memoryless channel with transition probabilities Pr[?/ | x] and a mapping d{-), we define the 
following channel parameter. 

A^— ^ E EyP^M^WJPrM^ (24) 

For example, consider an AWGN channel with a noise variance of a. A for this case is obtained in a similar 
manner to that of [29] [Example 12]. 

In Appendix IIV-GI we present the concept of non-degeneracy for mappings 5{-) and channels (taken from [1]). 
Under these assumptions, A is strictly smaller than 1. We assume these non-degeneracy definitions in the following 
theorem. 

Finally, we are now ready to state the stability condition for coset GF(g) LDPC codes: 

Theorem 5: Assume we are given the triplet (A, p, 5) for a coset GV{q) LDPC ensemble designed for the above 
discrete memoryless channel. Let Pq denote the probability distribution function of R(°\ the initial message of 
density evolution. Let P* = PeCR-t) denote the average probability of error at iteration t under density evolution. 

Assume £^exp(s • R'l'^'^) < cxd in some neighborhood of zero (where R'^'^^ denotes element 1 of the LLR 
representation of R'^°^). Then 

1) If A'(0)p'(l) > 1/A then there exists a positive constant ^ = ^(p, A, Pq) such that Pj > C for all iterations t. 

2) If A'(0)/9'(l) < 1/A then there exists a positive constant ^ = ^{p, A, Pq) such that if Pj < ^ at some iteration 
t, then P* approaches zero as t approaches infinity. 

Note that the requirement £'exp(s • R'^^^) < oo is typically satisfied in channel of interest. The proof of Part^ 
of the theorem is provided in Appendix |3 and the proof of Part |2 is provided in Appendix IVII Outlines of both 
proofs are provided below. 

The proof of Part ^ is a generalization of a proof provided by Richardson et al. [29]. The proof [29] begins by 
observing that since the distributions at some iteration t are symmetric, they may equivalently be modelled as APP 
values corresponding to the outputs of a MBIOS channel. By an erasure decomposition lemma, the output of an 
MBIOS channel can be modelled as the output of a degraded erasure channel. The proof proceeds by replacing 
the distributions at iteration t by erasure-channel equivalents, and shows that the probability of error with the new 
distributions is lower bounded by some nonzero constant. Since the true MBIOS channel is a degraded version of 
the erasure channel, the true probability of error must be lower-bounded by the same nonzero constant as well. 
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Returning to the context of coset GF(g) LDPC codes, we first observe that by Theorem [2 the random variable 
Rj at iteration t is symmetric and hence by Lemma |6l it can be modelled as APP values of the outputs of a cyclic- 
symmetric channel. We then show that any cyclic-symmetric channel can be modelled as a degraded erasurized 
channel, appropriately defined. The continuation of the proof follows in the lines of [29]. 

The proof of Partialis a generalization of a proof by Khandekar [20]. As in [20] (and also [6]), our proof tracks 
a one-dimensional functional of the distribution of a message X, denoted -D(X). We show that the rightbound 
messages at two consecutive iterations, satisfy D{R,t+i) < A • A (1 — p{\ — L'(Rt)) + 0{D{R.t)'^)). Using first- 
order Taylor expansions of A(-) and /)(•), we proceed to show i:»(Rt+i) < A • A'(0)p'(l) • -D(Rt) + 0{D{Yitf). 
Since A • A'(0)p'(l) < 1 by the theorem's conditions, for small enough L'(Rt) we have D{Yit+i) < K • DiYit) 
where K < 1, and thus D(Rf) descends to zero. Further details, including the relation between -D(Rj) and P*, 
are provided in Appendix |Vj 

D. Gaussian Approximation 

With binary LDPC, Chung et al. [9] observed that the rightbound messages of density-evolution are well 
approximated by Gaussian random variables. Furthermore, the symmetry of the messages in binary LDPC decoding 
implies that the mean m and variance cr^ of the random variable are related by a"^ = 2m. Thus, the distribution of 
a symmetric Gaussian random variable may be described by a single parameter: a. This property was also observed 
by ten Brink et al. [35] and is essential to their development of EXIT charts. In the context of nonbinary LDPC 
codes, Li et al. [22] obtained a description of the q — 1-dimensional messages, under a Gaussian assumption, by 
q — 1 parameters. 

In the following theorem, we use symmetry and permutation-invariance as defined in Sections IV-BI and IVI-BI to 
reduce the number of parameters from g — 1 to one. This is a key property that enables the generalization of EXIT 
charts to coset G¥{q) LDPC codes. 

Note that the theorem assumes a continuous Gaussian distribution. The definition of symmetry for LLR-vector 
random variables (Lemma |2ll is extended to continuous distributions by replacing the probability function in (fT9t 
with a probabiUty density function. 

Theorem 6: Let W be an LLR-vector random- variable, Gaussian distributed with a mean m and covariance 
matrix S. Assume that the probability density function /(w) of W exists and that S is nonsingular. Then W is 
both symmetric and permutation-invariant if and only if there exists o" > such that. 
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That is, mi = a"^ /2, i = 1, <? — 1, and Sjj = ii i = j and a"^ /2 otherwise. 

The proof of this theorem is provided in Appendix IVIII A Gaussian symmetric and permutation-invariant random 
variable, is thus completely described by a single parameter a. In Sections IVII-B I and IVII-DI we discuss the validity 
of the Gaussian assumption with coset GF(g) LDPC codes. 
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VII. Design of Coset GF{q) LDPC Codes 

With binary LDPC codes, design of edge distributions is frequently done using extrinsic information transfer 
(EXIT) charts [35]. EXIT charts are particularly suited for designing LDPC codes for AWGN channels. In this 
section we develop EXIT charts for coset GF{q) codes. We assume throughout the section transmission over AWGN 
channels. 

A. EXIT Charts 

Formally, EXIT charts track the mutual information /(C;W) between the transmitted code symbol C at an 
average variable node^ and the rightbound (leftbound) message W transmitted across an edge emanating from it. If 
this information is zero, then the message is independent of the transmitted code symbol and thus the probability of 
error is {q — l)/q. As the information approaches 1, the probability of error approaches zero. Note that we assume 
that the base of the log function in the mutual information is q, and thus < /(C; W) < 1. 

I(C; W) is taken to represent the distribution of the message W. That is, unlike density evolution, where the 
entire distribution of the message W at each iteration is recorded, with EXIT charts, I{C; W) is assumed to be a 
faithful surrogate (we will shortly elaborate how this is done). 

With EXIT charts, two curves (functions) are computed: The VND (variable node decoder) curve and the CND 
(check node decoder) curve, corresponding to the rightbound and leftbound steps of density-evolution, respectively. 
The argument to each curve is denoted I a and the value of the curve is denoted Ie- With the VND curve, I a is 
interpreted as equal to the functional /(C;L() when applied to the distribution of the leftbound messages at a 
given iteration t. The output Ie is interpreted as equal to I{C; Ht) where Rj is the rightbound message produced 
at the following rightbound iteration. With the CND curve, the opposite occurs. 

Note that unlike density-evolution, where the densities are tracked from one iteration to another, the VND and 
CND curves are evaluated for every possible value of their argument Ia- However, a decoding trajectory that 
produces an approximation of the functionals /(C; L^) and I{C; Ht) at each iteration, may be computed (see [36] 
for a discussion of the trajectory). 

The decoding process is predicted to converge if after each decoding iteration (comprised of a leftbound 
and rightbound iteration), the resulting Ie = /(C;Rt+i) is increased in comparison to Ia = /(C;Rt) of 
the previous iteration. We therefore require Ie,vnd{Ie,cnd{Ia)) > Ia for all Ia € [0,1]- Equivalently, 
Ie,vnd{Ia) > lE^cNoi-lA)- In an EXIT chart, the CND curve is plotted with its Ia and Ie axes reversed 
(see, for example. Fig. 0). The decoding process is thus predicted to converge if and only if the VND curve is 
strictly greater than the reversed-axes CND curve. 

B. Using /(C; W) as a Surrogate 

Let W be a leftbound or rightbound message at some iteration of belief-propagation. Strictly speaking, an 

approximation of /(C; W) requires not only the knowledge of the distribution of W but primarily the knowledge 

*In Definition Q the notation C was used to denote a code rather than a codeword symbol. The distinction between the two meanings is 
to be made based on the context of the discussion. 
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of the conditional distribution Pr[W \ C = i] for all i = 0, ...,q — 1 (we assume that C is uniformly distributed). 
However, as shown in Lemma (TTl (Appendix IIII-Al i. the messages of the coset GF(g) LDPC decoder satisfy 



The proof of this lemma is provided in Appendix IVIII-AI Note that by Lemma (Appendix IIII-AI) . we may 
replace the conditioning on C = in (I26t by a conditioning on the transmission of the all-zero codeword. In the 
remainder of this section, we will assume that all distributions are conditioned on the all-zero codeword assumption. 

In their development of EXIT charts for binary LDPC codes, ten Brink et al. [35] confine their attention to 
LLR message distributions that are Gaussian and symmetric. Under these assumptions, a message distribution is 
uniquely described by its variance cj^. For every value of a, they evaluate d26b (with q = 2) when applied to the 
corresponding Gaussian distribution. The result, denoted J {a), is shown to be monotonically increasing in a. Thus 
J^^(-) is well-defined. Given / = /(C; W), J^^{I) can be applied to obtain the a that describes the corresponding 
distribution of W . Thus, /(C; W) uniquely defines the entire distribution of W . 

The Gaussian assumption is not strictly true. With binary LDPC codes, assuming transmission over an AWGN 
channel, the distributions of rightbound messages are approximately Gaussian mixtures (with irregular codes). The 
distributions of the leftbound messages, resemble "spikes". The EXIT method in [35] nonetheless continues to model 
the distributions as Gaussian. Simulation results are provided, which indicate that this approach still produces a 
very close prediction of the performance of binary LDPC codes. 

With coset GV{q) LDPC codes, we discuss two methods for designing EXIT charts. The first method models the 
LLR-vector messages distributions as Gaussian random variables, following the example of [35]. This modelling 
also enables us to evaluate the VND and CND curves using approximations that were developed in [35], thus 
greatly simplifying their computation. 

However, the modelhng of the rightbound message distributions of coset GF(g) LDPC as Gaussian is less accurate 
than it is with binary LDPC codes. As we will explain in Section IVII-DI this results from the distribution of the 
initial messages, which is not Gaussian even on an AWGN channel. In Section fVII-DI we will therefore develop an 
alternative approach, which models the rightbound distributions more accurately. We will then apply this approach 
in Section IVII-EI to produce an alternative method for computing EXIT charts. With this method, the VND and 
CND curves are more difficult to compute. However, the method produces codes with approximately IdB better 
performance. 



Pr[W = w I C = i] = Pr[W 



w+' I C = 0] 



Thus, we may restrict ourselves to an analysis of the conditional distribution Pr[W | C = 0]. 



Lemma 11: Under the tree-assumption, the above defined W satisfies: 




(26) 
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C. Computation of EXIT Charts, Method 1 

With this method, we confine our attention to distributions that are permutation-invariant^, symmetric and 
Gaussian. By Theorem|6l under these assumptions, a g — 1 dimensional LLR- vector message distribution is uniquely 
defined by a parameter a. We proceed to define J((t) in a manner similar to that of [35]. In Appendix IVIII-DI we 
show that J{a) is monotonically increasing and thus J~^((t) is well defined. Given / = /(C; W), the distribution 
of W may be obtained in the same way as [35]. 

We use the following method to compute the VND and CND curves, based on a development of ten 
Brink et al. [35] for binary LDPC codes. 

1) The VND curve. By (I15t . a rightbound message is a sum of incoming leftbound messages and an initial 
message. Let I a and denote the mutual-information functional of the incoming leftbound messages and 
initial messages, respectively. By Lemma |5j 7^°-' equals the equiprobable-signalling capacity of the channel 
with the mapping 5{-). It may be obtained by numerically evaluating I{U,Y') as defined in Section IV-CI 
For each left-degree i, we let Ie,vnd{Ia'-, i,/*^*^-*) denote the value of the VND curve when confined to 
the distribution of rightbound messages across edges whose left-degree is i. We now employ the following 
approximation, which holds under the tree assumption, when both the initial and the incoming leftbound 
messages are Gaussian. 

The validity of this approximation relies on the observation that a rightbound message ([151 is equal to a sum 
of i — 1 i.i.d. leftbound messages and an independently distributed initial message (under the tree assumption). 
[J^^(/^)]^ is the variance of each of the leftbound messages and [J^^(/(°))]^ is the variance of the initial 
message, and hence the variance of the rightbound message is (i — l)[J^^(/yi)]^ + [J^^{I^^^)]'^. 

2) The CND curve. Let Ie,cnd{Ia', j) denote the value of the CND curve when confined to the distribution 
of leftbound messages across edges whose right-degree is j. 

Ie,cnd{Ia; j)-l-J {Vr^-J-\l - Ia)) 

This approximation is based on a similar approximation that was used in [35] and relies on Sharon et al. [31]. 
In the context of coset GF{q) LDPC codes, we have verified its effectiveness empirically. 
Given an edge distribution pair (A, p), we have 

c 

Ie,vnd ,VND 

i=2 
d 

Ie,CNd{Ia) = ^PjlE,CND{lA\ j) (27) 

i=2 

Code design may be performed by fixing the right-distribution p and computing A. Like [35], the following 

constraints are used in the design. 

'strictly speaking, rigiitbound messages are not permutation-invariant. However, in Appendix IVIII-BI we show tliat this does not pose a 
problem to the derivation of EXIT charts. 
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1) A is required be a valid probability vector. That is Aj > and = 1- 

2) To ensure decoding convergence, we require Ie,vnd{I , I^^^) > ^e^cnd^^) explained in Section fVII-At 
for all / belonging to a discrete, fine grid over (0, 1). 

The design process seeks to maximize "^Xi/i, which by Q is equivalent to maximizing the design rate of the 
code. Typically, this can be done using a linear program. A similar process can be used to design p with A fixed. 



D. More Accurate Modelling of Message Distributions 

We now provide a more accurate model for the rightbound messages, as mentioned in Section IVII-BI We focus, 
for simplicity, on regular LDPC codes. Observe that the computation of the rightbound message using (fTSb involves 
the summation of i.i.d leftbound messages, r("). This sum is typically well-approximated by a Gaussian random 
variable'". To this sum, the initial message r'^°^ is added. With binary LDPC codes, transmission over an AWGN 
channel results in an initial message r'^''^ which is also Gaussian distributed (assuming the all-zero codeword was 
transmitted). Thus, the rightbound messages are very closely approximated by a Gaussian random variable. 

With coset GF(g) LDPC codes, the initial message is not well approximated by a Gaussian random variable, as 
illustrated in the following lemma: 

Lemma 12: Consider the initial message produced at some variable node, under the all-zero codeword assumption, 
using LLR representation. Assume the transmission is over an AWGN channel with noise variance cr^ and with a 
mapping 5{-). Let the coset symbol at the variable node be v. Then the initial message r''-^-* is given by r''-^^ = 
cx{v) + P{v) ■ z, where z is the noise produced by the channel and a{v) and (3{v) are q — I dimensional vectors, 
dependent on v, whose components are given by, 

c^iv)^ = i^i^iv) - S{v + f3{v)i = \{5{v) - 5{v + i)) 

The proof of this lemma is straightforward from the observation that the received channel output \% y = b{y) ^ z. 

In our analysis, we assume a random coset symbol V that is uniformly distributed in GF(g). Thus, cxiy^ and 
/3(F) are random variables, whose values are determined by the mapping 5(-) and by the noise variance a\. 
The distribution of the channel noise Z is determined by a\. The distribution of the initial messages is therefore 
determined by (5(-) and a\. 

Fig. |5] presents the empirical distribution of LLR messages at several stages of the decoding process, as observed 
by simulations. The code was a (3, 6) coset GF(3) LDPC. Since g = 3, the LLR messages in this case are 



two-dimensional. The distribution of the initial messages (Fig. 5(a) i is seen to be a mixture of one-dimensional 

Gaussian curves, as predicted by Lemma fT^ The leftbound messages at the first iteration are shown in Fig. |5(b)| 

We model their distribution as Gaussian, although it resembles a "spike" and not the distribution of a Gaussian 

random variable (this situation is similar to the one with binary LDPC [9]). Fig. |5(c)| presents the sum of leftbound 

messages computed in the process of evaluating (flSl) . As predicted, this sum is well approximated by a Gaussian 

random variable. Finally, the rightbound messages at the first iteration are given in Fig. |5(d)| 

'"Quantification of the quality of the approximation is beyond the scope of this discussion. "Well approximated" is to be understood in a 
heuristic sense, in the context of suitability to design using EXIT charts. 



26 



Following the above discussion, we model the distribution of the rightbound messages as the sum of two random 
vectors. The first is distributed as the initial messages above, and the second (the intermediate sum of leftbound 
messages) is modelled as Gaussian". 

The intermediate value (the second random variable) is symmetric and permutation-invariant. This may be seen 
from the fact that the leftbound messages are symmetric and permutation-invariant (by Theorems |3] and |4li and 
from Lemmas (Appendix IIII-Et and |22l (Appendix lIV-Et . Thus, by Theorem |6l it is characterized by a single 
parameter a. 

Summarizing, the approximate distribution of rightbound messages is determined by three parameters: al and 5{-), 
which determine the distribution of the initial message, and a, which determines the distribution of the intermediate 
value. 

E. Computation of EXIT Charts, Method 2 

The second method for designing EXIT charts differs from the first (Section IVII-Ct in its modelling of the initial 
and rightbound message distributions, following the discussion in Section IVII-DI We continue, however, to model 
the leftbound messages as Gaussian. 

For every value of a, we define JR{a; (Tz,S) (az and 5 are fixed parameters) in a manner analogous to J (a) 
as discussed in Section fVII-CI That is, JR{a; (7z,6) equals (l26l when applied to the rightbound distribution 
corresponding to a, a1 and 8. In an EXIT chart, az and 5{-) are fixed. The remaining parameter that determines 
the rightbound distribution is thus a, and a = Jj^^{r,az,S) is well-defined'^. The computation of Jr and J^^ is 
discussed in Appendix IVIII-EI 

The following method is used to compute the VND and CND curves. 

1) The VND curve. For each left-degree i, we evaluate Ie,vnd{Ia', i,(^z,S) (defined in a manner analogous 
to Ie,vnd{Ia', of Section IVII-Ct using the following approximation: 



2) The CND curve. Let Ie,cnd{Ia] j^o'zjS) be defined in a manner analogous to Ie,cnd{Ia] j) of 
Section rVII-CI The parameters az and 6 are used in conjunction with a = Jj^^{lA',crz,S) to charac- 
terize the distribution of the rightbound messages at the input of the check-nodes. The computation of 
Ie,cnd{Ia', j,<^z,^) is done empirically and is elaborated in Appendix IVIII-FI 
Given an edge distribution pair {X, p) we evaluate Ie,vnd{Ia', (^z,^) and Ie,cnd{Ia', <^z,S) from the above 
computed {Ie,vnd{Ia', ctzi <^)}i=i and {Ie,cnd{Ia', j,'^z,S)}'j=i using expressions similar to (l27l . 

Note that Jj?(it; az,6) needs to be computed once for each choice of az and 6{-). Ie,cnd{c^', j,(^z,^) needs to 
be computed also for each value of j. J{a) needs to be computed once for each choice of q. 

"Note that with irregular codes, the number of i.i.d leftbound variables that is summed is a random variable itself (distributed as 
and thus the distribution of this random variable resembles a Gaussian mixture rather than a Gaussian random variable. However, we continue 
to model it as Gaussian, following the example that was set with binary codes [35]. 

'"See Appendix IVIII-EI for a more accurate discussion of this matter. 
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(a) Initial messages (b) Leftbound messages 




(c) Sum of leftbound messages, prior to the addition of the initial (d) Rightbound messages, 

message. 

Fig. 5. Empirical distributions of the messages of a (3,6) ternary coset LDPC code 

Design of edge-distributions A and p may be performed by linear programming in the same manner as in 
Section IVII-CI Further details are provided in Section IVII-FI below. 

F. Design Examples 

We designed codes for spectral efficiencies of 6 bits/s/Hz (3 bits per dimension) and 8 bits/s/Hz (4 bits per 
dimension) over the AWGN channel. In all our constructions, we used the above method 2 (Section IVII-EI) to 
compute the EXIT charts. Our Matlab source code is provided at [4]. 

For the code at 6 bits/s/Hz, we set the alphabet size at g = 32. We used a nonuniformly-spaced signal constellation 
A (following the discussion of Section Illl-Ct . The constellation was obtained by applying the following method, 
which is a variation of a method suggested by Sun and van Tilborg [33]. First, the unique points xq < xi < ... < 



28 



Xq-i were computed such that for X ~ AA(0, 1), Pv[xi < X < Xj+i] = l/{q + 1) i = 0, q — 2 and Pr[X < 
xq] = Pv[X > = l/{q + I). The signal constellation was obtained by scaling the result so that the average 

energy was 1. The mapping 6 from the code alphabet is given below, with its elements listed in ascending order 
using the representation of GF(32) elements as binary numbers (e.g. (^(00000) = —2.0701,(5(00001) = —1.7096). 
Note, however, that our simulations indicate that for a given A, different mappings 6 typically render the same 
performance. 

5 = [-2.0701, -1.7096, -1.473, -1.2896, -1.1362, -1.0022, -0.88161, -0.77061, -0.66697, -0.569, 
-0.47523, -0.38474, -0.29689, -0.21075, -0.12592, -0.041887, 0.041887, 0.12592, 0.21075, 0.29689, 
0.38474, 0.47523, 0.569, 0.66697, 0.77061, 0.88161, 1.0022, 1.1362, 1.2896, 1.473, 1.7096, 2.0701] 

We fixed p(7) = 1 and iteratively applied linear programming, first to obtain A, and then, fixing A, to obtain a 
better p. 

Rather than require Ie,vnd{I') ctz,S) > lE^cNoi-^'^ (^zi^) as in Sections IVII-AI and IVII-CI we enforced a more 
stringent condition when designing A. We required Ie,vnd{I', c^, 5) > lE^cNoi^'^ '^z, ^) + where e(/) equals 

5 • 10^3 when / G (0, 0.5), equals 4 • 10~3 when / G [0.5, 0.6) and is zero elsewhere. Similarly, when designing p, 
we required Ie,cnd{I; o-z,S) > lEyNoi^' ^^^^) + ^ " 

After a few linear programming iterations, we obtained the edge-distributions A(2, 5, 6, 16, 30) = 
(0.5768,0.1498,0.07144,0.1045,0.09752), p(5, 6, 7, 8, 20) = (0.09973,0.02331,0.5885,0.1833,0.1051). The code 
rate is 3/5 GF(32) symbols per channel use, equal to 3 bits per channel use, and a spectral efficiency of 6 bits/s/Hz. 
Interestingly, this code is right-irregular, unlike typical binary LDPC codes. Fig. |6l presents the EXIT chart for the 
code (computed by method 2). Note that the CND curve in Fig. |6ldoes not begin at I a = 0. This is discussed in 
Appendix EnTH 

Simulation results indicate successful decoding at an SNR of 18.55 dB. The block length was 1.8-10^ symbols, and 
decoding typically converged after approximately 150-200 iterations. The symbol error rate, after 50 simulations, 
was approximately 10^^. The unconstrained Shannon limit (i.e. not restricted to any signal constellation) at this 
rate is 17.99 dB, and thus our gap from this limit is 0.56 dB. This result is well beyond the shaping gap, which at 

6 bits/s/Hz is approximately 1.1 dB. 

We can obtain some interesting insight on these figures by considering the equiprobable-signalling Shannon-limit 
for our constellation (defined based on the equiprobable-signalling capacity, which was introduced in Section IV-Ct . 
At 6 bits/s/Hz, this limit equals 18.25 dB. The equiprobable-signalling Shannon limit is the best we can hope for 
with any design method for the edge-distributions of our code. The gap between our code's threshold and this limit 
is just 0.3 dB, indicating the effectiveness of our EXIT chart design method. 

The equiprobable-signalling Shannon limit for a 32-R\M constellation, at 6 bits/s/Hz is 19.11 dB. The gap 
between this limit and the above-discussed limit for our constellation, is 0.86 dB. This is the shaping gain obtained 
from the use of a nonuniform signal constellation. 

For the code at 8 bits/s/Hz, we set the alphabet size at q = 64. We used the same method to construct a 
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Fig. 6. An EXIT chart, computed using method 2, for a code at a spectral efficiency of 6 bits/s/Hz and an SNR of 18.5 dB. 

nonuniformly-spaced signal constellation. The mapping to the signal constellation is given below. 

<5 = [-2.29, -1.98, -1.78, -1.63, -1.51, -1.4, -1.31, -1.23, -1.15, -1.08, -1.01, -0.951, -0.891, -0.834, 
-0.78, -0.727, -0.676, -0.627, -0.579, -0.532, -0.486, -0.441, -0.397, -0.354, -0.311, -0.268, -0.226, 
-0.185, -0.143, -0.102, -0.0613, -0.0204, 0.0204, 0.0613, 0.102, 0.143, 0.185, 0.226, 0.268, 0.311, 0.354, 0.397, 
0.441, 0.486, 0.532, 0.579, 0.627, 0.676, 0.727, 0.78, 0.834, 0.891, 0.951, 1.01, 1.08, 1.15, 1.23, 1.31, 1.4, 1.51, 
1.63,1.78,1.98,2.29] 

We fixed p{8) = 1 and applied one iteration of linear programming to obtain A(2, 9, 29) = (0.7087, 0.1397, 0.1516). 
The code rate is 2/3 GF(64) symbols per channel use, equal to 4 bits per channel use, and a spectral efficiency of 
8 bits/s/Hz. Fig. Q presents the EXIT charts for the code using the two methods. 

Simulation results indicate successful decoding at an SNR of 25.06 dB over the AWGN channel. The block 
length was 10^ symbols, and decoding typically converged after approximately 70 iterations. The symbol error rate, 
after 100 simulations, was exactly zero. We also applied an approximation of density-evolution by Monte-Carlo 
simulations, as mentioned in Section fVI-AI and obtained similar results. The gap between our code's threshold and 
the unconstrained Shannon limit, which at 8 bits/s/Hz is 24.06 dB, is 1 dB. This result is beyond the shaping gap, 
which at 8 bits/s/Hz is 1.3 dB. The equiprobable-signalling Shannon limit for our signal constellation at 8 bits/s/Hz 
is 24.34 dB. The gap between our code's threshold and this limit is thus only 0.72 dB. 

vni. Comparison with other Bandwidth-Efficient Coding Schemes 

The simulation results presented in Section IVII-FI indicate that coset GF{q) LDPC codes have remarkable 
performance over bandwidth-efficient channels. In this section, we compare their performance with multilevel 
coding using binary LDPC component codes and with with turbo-TCM. 
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(a) An EXIT chart computed using method 1 at an SNR of 26.06 (b) An EXIT chart computed using method 2 at an SNR of 25.06 
dB. dB. 

Fig. 7. EXIT charts for a code at a spectral efficiency of 8 bits/s/Hz. 

A. Comparison with Multilevel Coding (MLC) 

Hou et al. [18] presented simulations for MLC over the AWGN channel at a spectral efficiency of 2 bits/s/Hz 
(equal to 1 bit per dimension), using a 4-PAM constellation. The equiprobable-signalling Shannon hmit^^ for 4-PAM 
and at this rate is 5.12 dB (SNR). Their best results were obtained using multistage decoding (MSD). At a block 
length of 10^ symbols, their best code is capable of transmission at 1 dB of the Shannon limit with an average 
BER of about 10~^. It is composed of binary LDPC component codes with maximum left-degrees of 15. 

We designed edge-distributions for two coset GF(4) LDPC codes at the same spectral efficiency, signal 
constellation and BER as [18]. Our first code's edge-distributions are given by A(2, 3, 4, 5, 6, 7, 15, 16, 20, 21) = 
(0.341895, 0.172092, 0.081613, 0.064992, 0.043213, 0.000037, 0.029562, 0.140071, 0.000002, 0.126522) and 
p{7) = 1. Our simulations at a block-length of 10^ indicate that this code is capable of transmission within 0.55 
dB of the Shannon hmit (ICQ simulations), and thus has a substantial advantage over the above MLC LDPC code, 
which is capable of transmission only within 1 dB of the Shannon limit. 

Our above code has obtained its superior performance at the price of increased decoding complexity, in comparison 
with the MLC code of [18]. We also designed a second code, with a lower decoding complexity, in order to 
compare the two schemes when the complexity is restricted. This code's edge distributions are given by A(2, 3, 6) = 
(0.3978,0.2853,0.3169) and p(5,6) = (0.203,0.797). Our simulation results indicate that the code is capable of 
reliable transmission within 0.8 dB of the Shannon limit. The code's maximum left-degree is 6, and is thus lower 
than the MLC code of [18]. Consequently, it has a lower level of connectivity in its Tanner graph, implying that 
its slightly better performance was achieved at a comparable decoding complexity. A precise comparison between 
the decoding complexities of the two codes must account for the entire edge-distributions (rather than just the 



"Throughout this section, we assume equiprobable-signalling whenever we refer to the Shannon limit. 
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maximum left-degrees), and for the number of decoding iterations. Such a comparison is beyond the scope of this 
work. 

Hou et al. [18] also experimented at a large block length of 10^ symbols. Their best code is capable of 
transmission within 0.14 dB of the Shannon limit. At a slightly smaller block length (5 • 10^ symbols), our above- 
discussed first code is capable of transmission within 0.2 dB of the Shannon limit (14 simulations), and thus has 
a slightly inferior performance. This may be attributed either to the smaller block-length that we used, or to the 
availability of density-evolution for the design of binary MLC component LDPC codes at large block lengths. 

Hou et al. [18] obtained their remarkable performance at large block lengths also at the price of increased 
decoding complexity (the maximum left-degrees of their component codes are 50). It could be argued that increasing 
the decoding complexity could produce improved performance also at the above mentioned block length of 10^. 
We believe this not to be true, because increasing the maximum left-degree would also result in an increase in the 
Tanner graph connectivity. This, at short block lengths, would dramatically increase the number of cycles in the 
graph, thus reducing performance. 

Summarizing, our simulations indicate that coset GF(g) LDPC have an advantage over MLC LDPC codes at 
short block lengths in terms of the gap from the Shannon limit. This result assumes no restriction on decoding 
complexity. The simulations also indicate that when decoding complexity is restricted, both schemes admit 
comparable performance. In this case, however, further research is required in order to provide a more accurate 
comparison of the two schemes. 

B. Comparison with Turbo Trellis-Coded Modulation (Turbo-TCM) 

Robertson and Worz [30] experimented with turbo-TCM at several spectral efficiencies and block lengths. The 
highest spectral-efficiency they experimented at was 5 bits/s/Hz. They used a 64-QAM constellation, and their best 
results were achieved at a block length of 3000 QAM symbols. They obtained a BER of 10^^ at an SNR of about 
16.85 dB. The equiprobable-signalling Shannon-limit at 5 bits/s/Hz is 16.14 dB, and thus their result is within 
approximately 0.7 dB of the Shannon limit. 

We experimented with an 8-PAM constellation and a block length of 6000 PAM symbols, which are the one- 
dimensional equivalents of two-dimensional 64-QAM and of 3000 QAM symbols. Our code's edge distributions 
are A(2,3,4,18) = (0.375115,0.049623,0.255708,0.319554) and p{21) = 1. Simulation results indicate a symbol 
error rate of less than 10~^ at an SNR of 16.6 dB (100 simulations). This result is within 0.46 dB of the Shannon 
limit, and thus exceeds the above result of 0.7 dB. 

IX. Conclusion 

A. Suggestions for Further Research 

1) Nonuniform labels. The labels of GF{q) LDPC codes, as defined in Section UlI-AI are randomly selected 
from GF((7)\{0} with uniform probability. Davey and MacKay [10], in their work on GF{q) LDPC codes 
for binary channels, suggested selecting them differently. It would be interesting to investigate their approach 
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(and possibly other approaches to the selection of the labels) when applied to coset GF{q) LDPC codes for 
nonbinary channels. 

2) Density evolution. In Section IVI-AI we discussed the difficulty in efficiently computing density evolution for 
nonbinary codes. An assumption in that discussion is that the densities would be represented on a grid of the 
form {— M/2 • A, M/2 • A}*?^^ (assuming LLR-vector representation), requiring an amount of memory of 
the order of (M + l)*?^^ . However, a more efficient approach would be to experiment with other forms of 
quantization, perhaps tailored to each density. We have tried applying the Lloyd-Max algorithm to design such 
quantizers for each density. However, the computation of the algorithm, coupled with the actual application 
of the quantizer, are too computationally complex. An alternative approach would perhaps make use of a 
Gaussian approximation as described in Section fVI-DI to design effective quantizers. 

3) Other surrogates for distributions. In [6], the functional EX (X denoting a message of a binary LDPC 
decoder) was used to lower-bound (rather than approximate) the asymptotic performance of binary LDPC 
codes. It would be interesting to find a similar, scalar, functional that can be used to bound the performance 
of coset GF{q) LDPC codes. Another possibility is to experiment with the function -D(X), which is defined 
in Appendix IVII 

4) Comparison with the g-ary erasure channel (QEC). In a QEC(e) channel, the output symbol is equal 
to the input with a probability of 1 — e and to an erasure with a probability of e. Much of the analysis of 
Luby et al. [23] for LDPC codes over binary erasure channels is immediately applicable to GF(g) LDPC 
codes over QEC channels. It may be possible to gain insight on coset GF(g) LDPC codes from an analysis 
of their use over the QEC. 

5) Better mappings. The mapping function 5{-) that was presented in Section IVII-FI was designed according to 
a concept that was developed heuristically. Further research may perhaps uncover better mapping methods. 

6) Additional channels. The development in Section IVIII focuses on AWGN channels. It would be interesting 
to extend this development to additional types of channel. 

7) Additional applications. In [3], coset GF(g) LDPC codes were used for transmission over the binary dirty- 
paper channel. Applying an appropriately designed quantization mapping (as discussed in Section Illl-Ct . 
a binary code was produced whose codewords' empirical distribution was approximately Bemoulli(l/4). 
There are many other applications, beside bandwidth-efficient transmission, that could similarly profit from 
codewords with a nonuniform empirical distribution. 

B. Other Coset LDPC Codes 

In [1], other nonbinary LDPC ensembles, called BQC-LDPC and MQC-LDPC, are considered (beside coset 
GF(g) LDPC). Random-coset analysis, as defined in Section applies to these codes as well. Similarly, the all- 
zero codeword assumption (Lemma ^ and the symmetry of message distributions (Definition |4] and Theorem [0 
apply to these codes. With MQC-LDPC, +i in ^ is evaluated using modulo-g arithmetic instead of over GF(g). 
With BQC-LDPC decoders, which use scalar messages, symmetry coincides with the standard binary definition 
of [29]. Channel equivalence as defined in Section IV-Cl applies to MQC-LDPC codes, but not to BQC-LDPC. 
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C. Concluding Remarks 

Coset GF{q) LDPC codes are a natural extension of binary LDPC codes to nonbinary channels. Our main 
contribution in this paper is the generalization of much of the analysis that was developed by Richardson et al. [28], 
[29], Chung et al. [9], ten Brink et al. [35] and Khandekar [20] from binary LDPC codes to coset GF{q) LDPC 



Random-coset analysis helps overcome the absence of output-symmetry. With it, we have generalized the all- 
zero codeword assumption, the symmetry property and channel equivalence. The random selection of the nonzero 
elements of the parity-check matrix (the labels) induces permutation-invariance on the messages. Although density- 
evolution is not reaUzable, permutation-invariance enables its analysis (e.g. the stability property) and approximation 
(e.g. EXIT charts). 

Analysis of GF{q) LDPC codes would not be interesting if their decoding complexity was prohibitive. Richardson 
and Urbanke [28] have suggested using the multidimensional DPT. This, coupled with an efficient recursive 
algorithm for the computation of the DFT, dramatically reduces the decoding complexity and makes coset GF{q) 
LDPC an attractive option. 

Although our focus in this work has been on the decoding problem, it is noteworthy that the work done by 
Richardson and Urbanke [27] on efficient encoding of binary LDPC codes is immediately applicable to coset GF{q) 
LDPC codes. For simulation purposes, however, a pleasing side-effect of our generalization of the all-zero codeword 
assumption is that no encoder needs to be implemented. In a random coset setting, simulations may be performed 
on the all-zero codeword alone (of the underlying LDPC code). 

Using quantization or non-uniform spaced mapping produces a substantial shaping gain. This, coupled with our 
generalization of EXIT charts has enabled us to obtain codes at 0.56 dB of the Shannon limit, at a spectral efficiency 
of 6 bits/s/Hz. To the best of our knowledge, these are the best codes found for this spectral efficiency. However, 
further research (perhaps in the lines of Section lIX-At may possibly narrow this gap to the Shannon limit even 
further. 

Appendix I 
Properties of the +g and x^r Operators 

Lemma 13: For g G GF((7)\{0} and i G GF{q), 



3) n(x^9) = n(x) 

4) n(x+*) = n(x) 

Proof: The first two identities are proved by examining the kth index of both side of the equation. The third 
identity is obtained from the second by observing that (x^^)"^-' = x^^ if and only if x^-''^ = x. The fourth identity 
is straightforward. □ 

Lemma 14: For g G GF(g)\{0} and i G GF{q), 



codes. 
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(a) The Tanner (b) A neighborhood graph. (c) The virtual neighbor- 

graph, hood graph 

Fig. 8. A neighborhood graph with cycles 

1) (x^^)* = (x*)^5' where (x*)^^ denotes the result of applying the operation xg on all elements of x*. 

2) (x+0* = X* 

The proof of the first identity is obtained from Lemma [21 identity |^ The second identity is straightforward. □ 

Appendix II 
Neighborhood Graphs with Cycles 

Fig. |8(b)| gives an example of a case where a neighborhood graph contains cycles. The neighborhood graph 
corresponds to the Tanner graph of Fig. |8(a)| 

When the neighborhood graph contains cycles, the APP values computed by a belief-propagation decoder 
correspond to a virtual neighborhood graph. In this graph, nodes that are contained in cycles are duplicated to 
artificially create a tree structure. For example, in Fig. |8(c)| a variable-node 1' was produced by duplicating 1. The 
APP values are computed according to the virtual code''^ C implied by this graph. C is virtual in the sense that 
it is based on false assumptions regarding the channel model and the transmitted code. In Fig. |8(c)[ the channel 
model falsely assumes that the nodes 1' and 1 correspond to different channel observations. 

Appendix III 
Proofs for SectionIvI 

A. Preliminary Lemmas 

The proofs in this section focus on the properties of a message produced at some iteration t of coset GF(q) 
LDPC belief propagation at a node n. Assuming the underlying code C is fixed, this message is a function of the 
''^See Frey et al. [15] for an elaborate discussion. 
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channel output y and the coset vector v. We therefore denote it by m(y, v). 

m(y, v) may be either a rightbound message from a variable-node or a leftbound message to a variable-node. 
In both cases, we denote the variable-node involved by i. We begin with the following lemma. 

Lemma 15: Let c be a codeword of C, y some given channel output, and v an arbitrary coset vector. Then 

m(y, V - c) = m(y, v)"''' (28) 
where Cj is the value of c at the codeword position i. 

In the left hand side of d28l) . v — c is evaluated componentwise over GF(q). In the right hand side, we are using 
the notation of ©. 
Proof: mfc(y, v) satisfies, 

'nT'kiy, v) = Pr[o"j = A; I <t G C, 5{a + v) was transmitted and y was received] (29) 

The above expression is only an estimate of the true APP value. The code used by the decoder is not the LDPC 
code C, but rather the code C defined by the parity-checks of the neighborhood graph spanned from n, as defined 
in Section ITV-CI and Appendix ^ cr is a random variable representing the transmitted codeword of C (prior to the 
addition of the coset vector) and cjj is its value at position i. The vectors v and y are constructed from v and 
y by including only values at nodes contained in the neighborhood graph of node n. We define c similarly. If 
the neighborhood graph contains cycles, we use the virtual neighborhood graph defined in Appendix lul For each 
variable-node that has dupUcate copies in this graph, elements of the true y, v and c will have duplicate entries in 
y, V and c. 

The decoder assumes that all codewords are equally likely, hence (l29l) becomes 

J2a =k crec P^[y received | 6{(t + v) was transmitted] 
^ J2(TeC ^^[y received | 6{(t + v) was transmitted] 

Equivalently, we obtain 

, , E.,=fe,^gcPr[y|^(^ + v-c)] 

So-ec Pr[y |(5(o-+v-c)] 

The word c, having being constructed from a true codeword c G C, satisfies all parity-checks in the neighborhood 
graph and is therefore a codeword of C. Changing variables, we set a' = a — c. Thus, for any cr G (7, we have 
<t' g C. The condition ai = k now becomes a[ = k — Ci and we have 



"ifc(y, V - c) 



"^fc-ci(y, v) = mfc(y, v)"'=* 



□ 

We now examine X=m(Y, V), which denotes the rightbound (leftbound) message from (to) a variable-node i, at 
some iteration of belief-propagation. V and Y are random variables representing the coset vector and channel-output 
vectors, respectively. 
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Lemma 16: For any k G GF(g), the value Pr[X = x | q = /c] is well-defined in the sense that for any two 
codewords c^-^) , c^^) G C that satisfy c\^^ = cf'^ = k, 

Pr[X = X I c(^) was transmitted] = Pr[X = x | c^^^ was transmitted] 

for all probability vectors x. 

Proof: Let c = c*^^) — c^^^. Consider transmission of c*^^) with an arbitrary coset vector of v, compared to 
transmission of c^^^ with a coset vector of v — c. In both cases, the transmitted signal over the channel is (^(v + c^^)), 
and hence the probability of obtaining any particular y is identical. The word c satisfies Cj = 0. Since C is linear, 
we have c G C. Therefore, Lemma flSl (Appendix IIII-ATi implies 

m(y, V - c) = m(y, v)""^' = m(y, v) (30) 

We therefore obtain that 

Pr[X = x I V = V, c^^) was transmitted] = Pr[X = x | V = v — c, c^^^ was transmitted] 

Since V is uniformly distributed, averaging over all possible values of V completes the proof. □ 
The following lemma will be useful in Section IVII-AI 
Lemma 17: For any k G GF((7), 

Pr[X = X I Q = fc] = Pr[X = X+'' | Q = 0] 
Proof: The proof follows almost in direct lines as Lemma Let c*^^) be the all-zero codeword, and c^^^ a 
codeword that satisfies c-^^ = k. Thus 

Pr[X = X I Cj = A;] = Pr[X = x | c^^^ was transmitted] 
Pr[X = X I Cj = 0] = Pr[X = X I c^^^ was transmitted] 

Ci = —k, and thus d30l) now becomes 

m(y, V - c) = m(y, v)""' = m(y, v)+^ 

Thus, 

Pr[X = X I V = V, c(^) was transmitted] = Pr[X = x^*^ | V = v — c, c^^) was transmitted] 
Averaging over all possible values of V completes the proof. □ 

B. Proof of Lemma\l\ 

Let c be some codeword. Let £'y^(c) denote the event of error at a message produced at a variable-node i after 
iteration t, assuming the channel output was y, the coset vector was v and the true codeword was c. Recalling 
the decision rule of Section HV-AI the decoder decides argmax^{mjt(y, v)} (where mjt(y, v) is defined as in 
Appendix nil- At . Using Lemma fTSl (Appendix IIII-At . we obtain that the maximum of {w.A:(y, v)} is obtained at 
if and only if the maximum of {rrafc(y, v — c)} is obtained at Cj. Therefore 

4,,_,(c) = 4_,(o) 
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In both cases, the word transmitted over the channel is 6(y) and hence the probability of obtaining any channel 
output y is the same. Therefore we obtain 

P*|,_,(c)=P*|,(0) 
Finally, averaging over all instances of v, we obtain 

□ 

C. Proof of Lemma^ 

We first assume that X is symmetric and prove ( fT9t . Let w be an arbitrary LLR-vector, x = LLR^^(w) and 
x+*, w+* be defined using and respectively. 

e"-- Pr[W = w+^] = — Pr[X = x+'^] = ^ ■ ■ n(x+^) Pr[X G (x+*)*] = xq ■ n(x) Pr[X G x*] 

Xi Xi 

= Pr[X = x] = Pr[W = w] 

where we have relied on Lemmas fT3l and fT4l (Appendix HI. This proves ( fT9t . 

We now assume ( fT9t and prove that X is symmetric. Let x and w be defined as above. 

1 '^-^ 

Pr[X e X*] = ^ Pr[X = z] = — - ^ Pr[X = x+'] (31) 
zex* '^y^) i=Q 

The last equality is obtained from the fact that n(z) = n(x) (Lemma fT3l Appendix m, and hence each z G x* is 
added in J2iZo PriX = x+'] exactly n(x) times. We continue. 



Pr[X G x*l = y Pr[W = w+'l = V e""'' Pr[W = wl 

^ n(x ^ n x) ^ 



^ Pr[W = w] ( e-""' ] = Pr[W = w] - ^'^^ ~ 



\ ^ / n(x) • xo n(x) • xq 

The equality before last results from ©, recalling that = in all LLR vectors. We thus obtain that X is 
symmetric as desired. □ 

D. Proof of Theorem 

Let f be a variable-node associated with the message produced at n, defined as in LemmafTSl (Appendix IIII- AT) . Let 
C, V and y be defined as in the proof of the lemma. Using this notation, we may equivalently denote the message 
produced at n by m(y,v). This is because the message is in fact a function only of the channel observations 
and coset vector elements contained in the neighborhood graph spanning from n. The following corollary follows 
immediately from the proof of Lemma [21 

Corollary 1: Let cr be a codeword of C. Then for any y and v as defined above, 

m(y,v - cr) = m(y, v)-'"' (32) 
where ai is the value of a at the codeword position corresponding to the variable-node i. 



38 



We now return to X, a random variable corresponding to the message produced at n and equal to m(Y, V). We 
assume plain-likelihood representation of messages. Let x be an arbitrary probability vector. Since we assume the 
all-zero codeword was transmitted, the random space consists of random selection of v and the random channel 
transitions. Therefore, 

Pr[X e X*] = ^ Pr[V = V, Y = y] (33) 

y,v:m(y,v)ex' 

Let N be the block length of code C (note that like C, iV is a function of the neighborhood graph spanning 
from n, which is also a function of the iteration number). The set of all vectors v e {GF(g)}^ can be presented 
as a union of nonintersecting cosets of C. That is 

{GF(g)}^= [j{r + C} 

where 7^ is a set of coset representatives with respect to C. For each vector v G {GF((7)}^, we let r G 7^ and 
cr G C denote the unique vectors that satisfy v = r + cr. 

Let y be a channel output portion and v a coset vector. From Corollary^ we have that m(y, v) = m(y, r+cr) = 
m(y,r)+'^'. Therefore, m(y, v) G x* if and only if m(y,r) G x*. We can thus rewrite (I33t as 

Pr[XGx*]= E Pr[V = v,Y = y]= ^ Pr[V G {r + C}, Y = y] (34) 

y,re7?,:m(y,r)ex* ve{r+C} y,re7^:m(y,r)ex* 

Examining Pr[X = x], we have 

Pr[X = x] = Pr[X = x,XGx*] = ^ Pr[X = x, V G {r + C}, Y = y] 

y,re7^:m(y,r)ex* 

^ Pr[X = x|VG{r + C'},Y = y]-Pr[VG{r + C'},Y = y] (35) 

y,rG7?.:m(y,r)ex' 

We now examine Pr[X = x | V G {r + C}, Y = y] for y and r such that m(y,r) G x*. The random space is 
confined to the random selection of the coset vector V from {r + (7} or, equivalently, a random selection of S G (7 
such that V = r + S. 

Applying Corollary [2 again, we have for V G {r + C} and assuming Y = y, 

X = m(y, V) = m(y, r + S) = m(y, r)+^' = z+^' (36) 

where z = m(y,r). We assumed m(y,r) G x* and therefore there exists some index / such that z = x^' (or 
equivalently x = z+'). We first assume, for simplicity, that n(x) = 1. Therefore, / is unique, and no other index /' 
satisfies z = x^''. From (l36t we have that X = x if and only if Sj = /. Therefore, 

Pr[X = x| Vg {r + (7},Y = y] = Prpi = ^ | V G {r + C}, Y = y] 

= Pr[Sj = / I I] G C, (5(r + 51) was transmitted and Y = y was received] 

Now the key observation in this proof is that under the tree assumption, the above corresponds to m/(y,r) = zi. 
Therefore 

Pr[X = X I V G {r + C}, Y = y] = = X;-^ = xq 
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We now consider the general case of n(x) = K, for arbitrary K. In this case there are exactly K indices /i, Ik 
satisfying z = x^'*", k = 1, K. Using the same arguments as before, we have 

Pr[X = X I V e {r + C}, Y = y] = 

= Pr[Sj € {h, Ik} | S G C*, 5{r + 5]) was transmitted and Y = y was received] 
= J2k=i = /fc I S € C, 6{r + S) was transmitted and Y = y was received] 

= Ef=l Zl, = T,k=l ^rj' = Ek=l XO = n(x) • XQ 

Recalling (l34l i and (l35l . we now have 

Pr[X = x] = xo • n(x) ^ Pr[V e{r + C},Y = y] 

y,rg7^:m(y,r)Gx* 

= Xq ■ n(x) Pr[X E x*] 

This proves (I18t . □ 



£. r/ie ^Mm o/ Two Symmetric Variables 

The following lemma is used in Section IVI-DI 

Lemma 18: Let A and B be two independent LLR-vector random-variables. If A and B are symmetric, then 
A + B is symmetric too. 

Proof: The proof relies on the observation that for all i S GF(g) and LLR vectors a and b, (a + b)"*"* = 
a+* + b+*. Let w be an LLR-vector and i G GF(g) an arbitrary element. 

Pr[A + B = vi^] = J2 Pi'fA = a] • = b] 

a+b=w 

(a+b)+'=w+' 

a+*+b+'=w+' 



e'"' Pr[A + B = w+*] 



□ 



F. Proof of Lemma\3\ 

By definition, component i of APP(y) satisfies 

APP(y)i = a Pr[Y = y \ x = i] 

Where a is some constant, independent of i (but dependent on y), selected such that the sum of the vector 
components is 1. Using d2n i. we have 

APP(y), = a ■ Vi ■ n{y) ■ Q{y*) 

= {a ■ n{y) ■ Q{y*)) ■ y. 
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y, being the output of the equivalent channel, is a probability vector. Thus the sum of all y components is 1. Hence 
a ■ n{y) ■ Qiy*) = 1. We therefore obtain our desired result 

APP(y), = Vi 



□ 



G. Proof of Lemma^ 

Let Y be a random variable denoting the equivalent channel output, and assume the equivalent channel's input 
(denoted x in Fig. |4li was zero. Y thus corresponds to a vector of APP probabilities, computed using the physical 
channel output y' and the coset vector component v. We can therefore invoke Theorem ^ and obtain that for any 
probability vector y, 

Pr[Y = y I X = 0] = yo • ^(y) • Pr[Y g y* | x = 0] 

Note that Theorem [2 requires that the entire transmitted codeword be zero and not only the symbol at a particular 
discrete channel time. However, since the initial message is a function of a single channel output, we can relax this 
requirement by considering a code that contains a single symbol. 

Let i be an arbitrary symbol from the code alphabet. Applying Lemma fTTl (Appendix IIIL At to the single-symbol 
code we obtain, 

Pr[Y = y\x = i] 



Therefore the equivalent channel is cyclic-symmetric. □ 



= Pr[Y = y+* I X = 0] 
= Vi ■ n(y) • Pr[Y G y*l 



H. Proof of Lemma^ 

Consider the following set of random variables, defined as in Fig. 0] X is is the input to the equivalent channel. 
V is the coset symbol, and U = X + V , evaluated over GF(g). X' = 5{U) is the physical channel input and Y' 
is the physical channel output, related to X' through the channel transition probabilities. Y = APP(y', V) equals 
the output of the equivalent channel, which is a deterministic function of Y' and V. 

Since the equivalent channel is symmetric, a choice of X that is uniformly distributed renders I{X; Y) that is 
equal to the equivalent channel's capacity. This choice of X renders U uniformly distributed as well, and thus 
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Cs = I{U; Y'). We will now show that /([/; Y') = I{X; Y). 

Pr[y' I ^7] 



I{U;Y') = E\og p^^^,j 

Pr[y' = I X' = 6{u 



= Y Y PrlY' = y',U = u] log- , 

V-'v-Vp[i^ -t/ V '11 Pr[Y' = y'\X' = 5iv + i)] 

= > > > Pr A = z, K = t", y = w log 1 

to^'oy^y '^ElrloPT[Y' = y'\X' = 6iu')] 

1=0 v=o y'ey 

= E\og{q-Yx) (37) 

where y denotes the physical channel's output alphabet, and Yx denotes the element of Y at index number X. 

Pr[Y I X] 



I{X;Y) = Elog 



Pr[Y] 



Pr[Y = y\X = i] 



= y y Pr[Y = y,X = i]log- -I 

where V is the set of all probability vectors. Using Lemma 0] and Definition |5] we have, for some probability 
function Q{y*), 

By definition of y as a probability vector, we have I]?'=o Hi' = ^ ^'^'^ '^^^s. 



I(X;Y) = ^^Pr[Y = y,X = i]log^ 



i=oyep , - ?^(y)Q(y*) 

9-1 

EEP^[Y = y,^ = ^] log(g-yi) 

i=o ye-p 

^log(<7->x) (38) 



Combining d37l with d38l completes the proof. □ 



Appendix IV 
Proofs for SectionIvTI 



A. Proof of Theorem 



We prove the theorem for R^. is the message at iteration t averaged over all possibilities of the neighborhood 
tree % . 

Pr[Rt=x] = Y.^i[Yit = i^\%]-Y>i:[%] 

% 

= E ^0 • n(x) Pr[Rt G x* | %] ■ Pr[Tt] 
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The last equation was obtained from Theorem [l] 

Pr[Rt=x] = xo • n(x) ^ Pr[Rt G X* I Tt] • Pr[rt] 

% 

= xq ■ n(x)Pr[Rt G x*] 

Hence Rt is symmetric as desired (R'-*'-* = Rq is obtained as a special case). The proof for Lt is similar. □ 



B. Proof of Lemma^ 

Let g' = j/i (evaluated over GF(g')), 

Fr[Xi = x] = Pr[Xf ^' = x] = FT[Xi.g, = x] = Pr[Xj = x] 
The proof for W is identical. □ 

C. Proof of Lemma^ 

First, we observe that w'^'^ = wq — Wk = —Wk- We now have 



Pi[Wk = w] = J2 = w] = ^ e""" Pr[W = w+^] = e"" ^ Pr[W 

= e"^ J2 = w] = e"" Fv[W_k = -w] = Vi[Wk = -w] 

The last result having been obtained from Lemma |8] □ 

D. Proof of Lemma ITOl 

We prove the lemma for the probability-vector representation. The proof for LLR- vector representation is identical. 
We first assume X = T and show that X is permutation-invariant. Let g G GF((7)\{0} be randomly selected as in 
Definition H such that X = T^f. Let g' G GF(g)\{0} be arbitrary such that H = X^f'. 

H = (T^^o^xg' — ^x{g-g') (•39-) 

g ■ g' is, sl random variable, independent of T that is distributed identically with g. Thus, H is identically distributed 
with = T = X. Since g' was arbitrary, we obtain that X is permutation-invariant. 

We now assume that X is permutation-invariant. Consider T=X^^ \ where g is uniformly random in GF(g)\{0} 
and independent of X. Equivalently, X = T^^. We now show that T is independent of g, 

Pr[T = t\g\= Pr[X^9"' =t\g]= Pr[X = t] 



the last result having been obtained by the definition of X as permutation-invariant. Since the above is true for all 
g, T is independent of g. Thus, X = T as desired. □ 
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E. Some Lemmas Involving Permutation-Invariance 

We now present some lemmas that are used in Appendices IIV-FI fVll and fVl and in Section fVI-DI The first three 
lemmas apply to both the probability-vector and LLR representations of vectors. 

Lemma 19: If X is a random-permutation of X, then Pe(X) = Pe(X.). 
The proof of this lemma is obtained from the fact that the operation xg, for all g, leaves element Xq unchanged. 

Lemma 20: If X is a symmetric random variable, and X is a random-permutation of X, then X is also symmetric. 
Proof: 

Pr[X = X I X G X*] = Yl Pi"!^ = X I X G X*, £/] Fv[g | X G x*] (40) 

geGF(g)\{0} 

In the following derivation, we make use of the fact that n(x^S') = n(x) (see Lemma Appendix HJi and 
(x*)^5 = (x^^)* (see Lemma fT4l Appendix Uli. 

Pr[X = x|XGx*,5] = Pr[X^f = x| X^f G X*] =Pr[X = x^9"' I XG (x*)^9"'] 

= Pr[X = x^f" |Xg (x^f")*] =Xo^" •n(x^f") = xo-n(x) (41) 

Combining (l40l and J^TTi we obtain 

Pr[X = X I X G X*] = xq ■ n(x) 

and thus conclude the proof. □ 
Lemma 21: If X is permutation-invariant and X is a random-permutation of X, then X and X are identically 
distributed. 

The proof of this lemma is straightforward from Definitions and [8l 

The following lemmas discuss permutation-invariance in the context of the LLR representation of random- 
variables. 

Lemma 22: Let A and B be two independent, permutation-invariant LLR-vector random-variables. Then W = 
A + B is also permutation-invariant. 

Proof: Let 5 G GF(g)\{0} and S7 = W^^. Let w be an arbitrary LLR-vector. 

Pr[n = w] = Pr[(A + B)^^ = w] =Pr[A^^' + B^^' = w] = ^ Pr[A^3 = a] • Pr[B^5 = b] 

a+b=w 

Pr[A = a] • Pr[B = b] = Pr[A + B = w] = Pr[W = v^^] 

a+b=w 

Since g and w are arbitrary, this implies that W is permutation-invariant, as desired. □ 
Lemma 23: Let A and B be two LLR-vector random variables. Let g, h and k be independent random variables, 
uniformly distributed in GF(q')\{0} and independent of A and B. Let A = A^f , W = A + B and W = W^'', 
B = B^*^, = A + B, = n^'*. Then W, and Q are identically distributed. 
Proof: We begin with the following equalities. 
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Consider the expressions for W and fl. g ■ h is identically distributed with g, and h is identically distributed with 
k. g ■ h is independent of h, and both are independent of A and B. The same holds if we replace g ■ h and h with 
g and k. Thus W and 17 are identically distributed. The proof for Ct is similar. □ 

F. Proof of Theorem 

Lj is permutation-invariant following the discussion at the beginning of Section IVI-BI and thus Part ^ of the 
theorem is proved. 

Rt = (R()^^ where the label g is randomly selected, uniformly from GF((7)\{0}. Thus is a random- 
permutation of Rf, and by Lemma [TUl it is permutation-invariant. Rj is symmetric by Lemma EUl (Appendix lIV-El i. 
and Pe(R't) = ^e(R-t) by Lemma [T9l (Appendix lIV-Et . This proves part|2lof the theorem. 

R*^'') is permutation-invariant by its construction. Rj is a random-permutation of R^. Switching to LLR 
representation, R'^ is obtained by applying expression dlSt . The leftbound messages are permutation-invariant, 
hence, by Lemma (Appendix IIV-H the sum Yl^i—i -'-''t*^^ ^1^° permutation-invariant. Using Lemma |^ 
(Appendix lIV-Et . the distribution of R'^ may equivalently be computed by replacing the instantiation r''-^-* of 
R'^°^ in (inil with an instantiation of R'^°\ 

The distribution of is computed in density evolution recursively from Rj, using dlOt . Thus, the above discussion 
implies that replacing R(°) with R(°) would not affect this density either. The remainder of Part |3] of the theorem 
is obtained from Lemmas |^ and □ 

G. Non-Degeneracy of Channels and Mappings 

A mapping is non-degenerate if there exists no integer n > 1 such that for all a ^ A, the number of elements 
satisfying 5{x) = a is a multiple of n. With quantization-mapping, such a mapping could be replaced by a simpler 
quantization over an alphabet of size q/n that would equally attain the desired input distribution Q{a). With 
nonuniform-spaced mapping, the number of elements mapped to each a G ^ is 1 and thus this requirement is 
typically observed. 

A channel is non-degenerate if there exist no values ai , 02 ^ A such that Pr [y | ai] = Pr [y 1 02] for all y belonging 
to the channel output alphabet. 

The proof of A < 1 when both the mapping and the channel are non-degenerate (A having been defined in MAV ) 
follows in direct lines as the one provided for < 1 in [1] [Appendix LA]. 

Appendix V 
Proof of Part [T] of Theorem [5] 

In this section, we prove the necessity condition of Theorem|5] Our proof is a generalization of the proof provided 
by Richardson et al. [29]. An outline of the proof was provided in Section fVLCI 
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A. The Erasurized Channel 

We begin by defining the erasurized channel for a given cycUc-symmetric channel and examining its properties. 
Our development in this subsection is general, and will be put into the context of the proof in the following 
subsection. 

Definition 9: Let Pr[y | x] denote the transition probabilities of a cyclic-symmetric channel (see Definition |5ll. 
Then the corresponding erasurized channel is defined by the following: 

The input alphabet is {0, g — 1}. The output alphabet is 3^ = 3^ U{0) •••) ~ 1} where 3^ is the output alphabet 
of the original (cyclic-symmetric) channel. The transition probabilities Pr[y | x] are defined as follows: 

For all probability vectors y G 3^, 



(Vi[y\x = i] < max(yo,-..,yg-i) 

Pr[y \x = i\ = I (42) 
I yscnd?^(y)Q(y*) Vi = max(yo, ■■■■.Vq-i) 

where 

• Q{y*) is defined as in Definition |5] 

• 2/scnd is obtained by ordering the elements of the sequence {yo, Vq-i) in descending order and selecting the 
second largest. This means that if the maximum of the sequence elements is obtained more than once, then 
yscnd would be equal to this maximum. 

For output alphabet elements j G {0, ...,q — 1} we define 

Pr[j \x = i] = { (43) 
I 1 - e j = 1 

where e is defined 

e = J2 Pr[y \x = 0] 
The following lemma discusses the erasurized channel: 

Lemma 24: The erasurized channel satisfies the following properties 

1) The transition probability function is valid. 

2) The original cyclic-symmetric channel can be represented as a degraded version of the erasurized channel. 
That is, it can be represented as a concatenation of the erasurized channel with another channel, whose input 
would be the erasurized channel's output. 

Proof: 

1) It is easy to verify that e < 1, and hence Pr[y | x = i] > for all i by definition. The rest of the proof follows 
from the observation that for all vectors y € 3^ (recall that 3^ C 3^) Pr[y | x = = Pr[y"''* | x = 0]. 

Pr[y I X = i] = ^ Pr[y | x = z] + Pr[i | x = i] = ^ Pr[y^* | x = 0] + 1 — e 

= Pr[y I X = 0] + 1 - e = 1 
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2) We define a transition probability function q{y \ y) where y ^ y and y & y. 

{1 y = y 

1/(1 - e) • (Pr[y \ x = j] - Pr[y \ x = j]) y=j£ {0, q - 1} 
otherwise 
It is easy to verify that the concatenation of the erasurized channel with q{- \ •) produces the transition 

probabilities Pr[y | x] of the original cyclic-symmetric channel. 

□ 

The erasurized channel is no longer cyclic-symmetric. Hence, if we apply a belief-propagation decoder on the 
outputs of an erasurized channel, Lemma |3l does not apply, and the initial messages are not identical to the channel 
outputs. However, the following lemma summarizes some important properties of the initial message distribution, 
under the all-zero codeword assumption. 

Lemma 25: Let Q{z) denote the message distribution at the initial iteration of belief propagation decoding over 
an erasurized channel (under the assumption that the zero symbol was transmitted). Then Q{z) can be written as 

Q(z) = ePs(z) + (1 - e)A[i,o,...,o] (44) 
where Pe{'2) is a probability function that satisfies 

PE[^i > : > zo] = 1 (45) 

and A[x_o,...,o] ^ distribution that takes the vector [1, 0, 0] (i.e., the vector y where yo = 1 and yi = 7^ 0) 
with probability 1 (Aj^ g q] must not be confused with A defined by (l24li '). 

Proof: For any probability vector z, we define = Pr(z | the channel output was y S 3^), and ^2(2) = 

Pr(z I the channel output was j € {0, q — 1}). We now have 

Q(z) = ePij(z) + (1 - e)P2(z) (46) 

We first examine Pe{z). Let y € 3^ denote the channel output. By definition we have 

Zi = a ■ Pr[y | x = z] (47) 

where a is a normalization constant, dependent of y but not on i, selected so that the sum of the vector elements 
(zq, Zq-i) is 1. We now examine all possibilities for y. 

First assume that the maximum of {y^, ...,yg_i} is obtained at yo and at yo only. Let iscnd / be an index 
where the second-largest element of {y^, is obtained. Then by (147 1 and (I42t . 

zo = a- Pr[y | x = 0] = a • yi,,,,^n{y)Q{y*) = a ■ Pr[y | x = i^cnd] = 

Now assume that the maximum is obtained at yo and also at yi^^^ where imax 7^ 0. Then it is easy to observe that 
zq = Zi^^^. Finally, assume that the maximum of {yo, yg-i} is not obtained at yg. Let imax be an index such 
that yi^^^ obtains the maximum. Then 

zo = a- Pr[y \ x = 0] = a ■ yo ■ n(y)Q(y*) < a ■ yscndn(y)Q(y*) = a ■ Pr[y | x = imax] = 
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In all cases, there exists an index i such that Zi > zq, as required (l45l . 

We now examine ^2(2). Assuming the symbol was transmitted, then by (l43l . the probability of obtaining 
any output symbol of the set j G {0, ...,q — 1} other than j = is zero. Also, the only input symbol capable of 
producing the output j = with probability greater than zero is the input i = 0. Hence the decoder produces the 
initial message [1,0, ...,0] with probability 1, and ^2(2) = A[i o,...,o] required. □ 

Consider transmission over the original, cyclic-symmetric channel. Let Pg be the uncoded MAP probability of error. 
Let Pg be the corresponding probability over the erasurized channel. 

In the erasure decomposition lemma of [29], similarly defined Pg and Pg are both equal to 1/2 • e, where e is 
the erasure channel's erasure probability. In the following lemma we examine e of the erasurized channel. 

Lemma 26: The following inequalities hold: 

< Pe < Pe < e 

Proof: 

1) The erasurized channel is symmetric (although not cyclic-symmetric): for all y G we have Pr[y \ x = i] = 
Pr[y"'"* I x = 0], and for all j G {0, g — 1} we have Pr[j \ x = i] = Pr[j — i \ x = 0]. Hence, the decoding 
error is independent of the transmitted symbol, and we may assume that the symbol was 0. 

Consider the erasurized channel output Y. The MAP decoder decides on the symbol with the maximum APP 
value. If more than one such symbol exists, a random decision among the maximizing symbols is made. Let 
Z denote the vector of APP values corresponding to Y. By Lemma l25l we have that with probability e, Z is 
distributed as Pe{z). Recalling d45b . we have that for messages distributed as Pe{'^), an error is made with 
probabiUty at least 1/2. Therefore, Pg > (l/2)e. 

2) By Lemma 1^ the cyclic-symmetric channel is a degraded version of the erasurized channel. Hence Pg > Pg. 

3) We now prove Pg < e. Let us assume once more that the symbol was transmitted. Recall that we are now 
examining the decoder's performance over the cyclic-symmetric channel (and not the erasurized channel). 
Therefore, by Lemma |3l the vector of APP values (according to which the MAP decision is made) is identical 
to the channel output. Let Pe(y) be defined as in Definition |6l We will now show that the following inequality 
holds, 

Pr[y \x = 0]> Pg(y) • Pr[y \ x = 0] (48) 

• If y is such that the maximum of {yo, is obtained only at yo we have from (l42l) that Pr[y | x = 
0] < Pr[y I X = 0]. However, in this case the decoder correctly decides 0. Hence Pg(y) = and d48b is 
satisfied. 

• In any other case, we have Pr[y | a; = 0] = Pr[y | x = 0]. Using Pg(y) < 1 we obtain ( l48t trivially. 
We now have 

Pg = ^ Pg(y) • Pr[y | x = 0] < ^ Pr[y | x = 0] = e 

□ 
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B. The Remainder of the Proof 

To complete the proof, we would like to show that the probability of error at iteration t cannot be too small. Let 
Rj+n, denote the rightbound messages at iteration t + n, where n = 0, 1, .... By Lemma |4] (in a manner similar 
to [29]), may equivalently be obtained as the initial message of a cyclic-symmetric channel. We now replace 
this channel with the corresponding erasurized channel, and obtain a lower bound on the probability of error at 
subsequent iterations. We let Rj+n, n = 0, 1, .., denote the respective messages following the replacement. 

In the remainder of the proof, we switch to log-likelihood representation of messages. We let R't+n denote the 
LLR-vector representation of Rt+„, n = 1, .... Adopting the notation of [29], we let Qn{^) denote the distribution 
of R'i+n- ^0 denotes the distribution of the initial message R''-^^ of the true cyclic-symmetric channel. 

Using LLR messages, Lemma l25l becomes 

(3o(w) = ePs(w) + (1 - e)A[«,,...,oo] 

Pe(w) now satisfies 

Psl^i > : < 0] = 1 (49) 

After n iterations of density evolution, the density becomes (in a manner similar to the equivalent binary case [29]) 

Qn = e{X'{0)p'{l)rPE ® Po^^""'^ ® Po + (1 - 6(A'(0)/5'(1))")A[^,...,^] + 0{e^) 

where Pq is defined in Theorem |5] Pq and P e correspond to the random-permutations of Pq and Pe (resulting 
from the effect of randomly selected labels), respectively and ® denotes convolution. Let (5„ denote the distribution 
of (R'i)^^, where g is the random label on the edge along which R'j is sent. Then 

Q„ = eiX'iO)p'il)rPE^PT + (1 - e(A'(0)p'(l))")A[^,...,^] + 0(6^) 

where we have used Lemma 123) (Appendix lIV-Et to obtain that a random-permutation of P^; ® p®^" Pq is 
distributed as Pe iS) P^". Using Lemma fT9l (Appendix lIV-El . the probability of error (assuming the zero symbol 
was selected) is the same for and Qn- Letting Pe{Qn) denote this probability of error, we have 

Pe(Qn) = e{X'{0)p'{l)rPe(PE ® Pq") + 0{e^) 

Defining the probability function T = Pe® P^", we have 

Pe(Qn) > e(A'(0)p'(l))"^r[3i > : ly, < 0] + 0(e2) 

> ie(A'(0)p'(l))"r[M^i<0] +0(6-2) 

> ^e(A'(0)p'(l))"(Ps[VFi<0]-Po^"[Tyi<0])+0(e2) (50) 

Recalling (l49b . Pe satisfies that with probability 1 there exists at least one index i / such that < 0. A 
random-permutation would transfer Wi to index 1 with probability l/{q — 1). Hence 

Pe[Wi<{)]>-^ (51) 

0-1 
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Let Pq^'' denote the marginal distribution of the R[^^^ element of R''^^^ By Lemma |9j P^q^ is symmetrically 
distributed in the binary sense. Following the development of [29] (similarly relying on results from [32] [page 14]), 
we obtain 

lim -logPf'^m < 0] = logSexp(--i?'/°^) (52) 

n^oo 77, 2 

For the above limit to be valid, we first need (see [32]) that £'exp(s • R'l^^) < cxd in some neighborhood of zero, 
as appears in the conditions of the theorem. We also need to show that ER[^^^ > (also see [32]). This will be 
proven shortly. We first examine E exp{—^R[^'^^). 



Lemma 27: 



i?exp(-ii?'/°))=A (53) 



Proof: Recalling that R''^''^ is a random-permutation of the initial message, we first observe 



exp(--iti )\g 



1 -k=i 

1 K" (0) 



| = -^E^exp(-ii?^(°)) (54) 



We now examine i?exp(— ^i?^^ '). Recalling (fT4ll . where Y denotes the random channel output and V denotes the 
random coset symbol. 



^'^P(-2^'= ) - Pr[y I 5(0 + V)] 



= - E E yp^fy I '^(^ + I ^W] (^5) 

^ v=Q y 

Combining d54l i. d55b and the definition d24l i we obtain d53b . □ 
We are now ready to show ER[^^^ > 0. Recall from the discussion in Section fVI-CI that A < 1. Using (l53l and 
the Jensen inequality, we obtain 

= logexp(-^Si?'/°^) < logEexp(-ii?'/°^) = log A < 

We now proceed with the proof. By d53t . d52b becomes 

lim - log Po "[VFi < 0] = log A (56) 

n—^oo fi 

The remainder of the proof follows in direct lines as in [29] and is provided primarily for completeness. Combining 
dSOl with dSTT i and (l56t we obtain that for arbitrary 77 > and large enough n, 

PeiQn) > ^^^e(AW(l) • (A -r?)r + 0(6 2) 

If A'(0)p'(l) > 1/A, by appropriately selecting rj we obtain that for n large enough 

PeiQn) > 2e + 0{e^) (57) 

O(-) denotes a function, dependent on A, p and n such that |0(2;)| < cx for some constant c. Hence there exists a 
constant e{X,p,n) such that if e < e(A,/>, n), then 

PeiQn) > e (58) 
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We now return to examine P* and Pj^"", the probabilities of error over the true channel, prior to the replacement 
of messages with those of an erasurized channel. Since the true channel is degraded in relation to the erasurized 
channel, we must have for e < e{\,p,n), Pg+" > Pe{Qn)- 

By Lemma l26l e < 2P*. Hence there exists ^(p, A, Pq) such that if P* < ^, then e < e{X,p,n) and hence dSSb 
is satisfied. However, Lemma l26l also asserts P* < e. Hence Pe{Qn) > Pi and consequently P*"*"" > P*. This 
contradicts Theorem |2l Thus we obtain our desired result of Pg > ^(p, A, Pq) for all t. □ 

Appendix VI 
Proof of Part El of Theorem [5] 

In this section, we prove the sufficiency condition of Theorem 15] Our proof is a generalization of the proof 
provided by Khandekar [20] from binary to coset GF(gr) LDPC. An outline of the proof was provided in 
Section IVLCl 

Note that throughout the proof we denote by O(-) functions for whom there exists a constant c > 0, not dependent 
on the iteration number t, such that |0(x)| < c ■ x. 

We are interested in Pe(Rt) (defined as in M2V ) where is the rightbound message as defined in Section IVI-AI 
We begin, however, by analyzing a differently defined P'(Rt). 

Let X be a probability-vector random variable. The operator P'(X) is defined as follows: 

D{y^)^EM=-L-^Y.EM (59) 
V Xo 9 - i V -^0 

Where X is a random-permutation of X. By definition of the random-permutation, the above definition is equivalent 
to 

D{X) = eJ^ (60) 
V -^0 

for all A; = 1, g - 1. Letting W = LLR(X) we obtain that 

P»(X) = Ee-^^' 

Note that when q = 2, this equation coincides with the Bhattacharya parameter that is used in [20], equation (4.4). 
From Lemma |^ (Appendix IV-Bt we obtain that, 

P»(r(°)) = A (61) 

where R(°) is the initial message as defined in Section fVI-AI We now develop a convenient expression for P'(X). 

Lemma 28: Let X denote a probability-vector symmetric random variable. Then P'(X) = i?/(X) where /(x) 
is given by 

/(x) = ^XiXj (62) 
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Proof: From (1591 we have 



q-l 



E 



X G X* 



(63) 



The outer expectation is over all sets x*. The inner expectation is conditioned on a particular set x*. We first focus 
on the inner expectation. 



E 



1 



q-l 
1=1 




q-l 



X G X* 



_ y y ./^Pr[X = x|XGx* 
1 4^. ^ V xn 



xex* i=i » ^0 

q-l ^ q-l 



r[X = x+'= I X G x*l 



(64) 



The last equality was obtained in the same way as (I31t . In the following, we use the fact that n(x+'^) = n(x) 
(Lemma fT3l Appendix IHi. 



q-l 



E 



'X., 



_y ±i|XGx* = — ^y_i-y, — 



q-l q-l 



k=Q i=l \ ^0 
q-l q-l 



+k -^0 



Y E E v^A^+i^ = / (x) 



A:=0 i=l 

/(x) is invariant under any permutation of the elements. It is therefore constant for all vectors of the set x*. Thus 
we can rewrite the above as 



E 



1 




X G X* =E (/(X) I X G X* 



i=l V 

Plugging the above into d63t completes the proof. □ 
We now examine the function /(•). 

Lemma 29: For any probability vector x, < /(x) < 1. 

Proof: /(x) > is obtained trivially from (I62t by observing that all elements of the sum are nonnegative. 
To prove /(x) < 1 we have, 

1 



/(x) 



E v^E 



1 



1 



q-l 
1 



Y E v^(E v^) 



(EVii)^-E^ 
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Applying Jensen's inequality we obtain 

2 



/(x) < ^ 



q-l 
1 



1 



□ 

Given a probability vector x, we define e(x) = 1 — max(xo, ...,Xq_i). The following lemma relates the functions 
£(•) and /(•). 
Lemma 30: 



e(x) < /(x) < q ■ Je(x 



Proof: Let imax be an index that achieves the maximum in (xq, ...,Xg-i). 
Consider d62l . For a particular element XjXj, assume without loss of generality i ^ imax- By definition of x, we 
have Xi < J2kj^i„^^^ xt = 1 — Xi^.^^ = e(x). By definition we also have xj < 1. Therefore ^XiXj < ^e(x). We 
now have, 

i,j(^GF{q)ijtj 

By definition of x, Xj,^^^ > l/^Z- Also, there must exist i / imax such that x^ > (1 — Xj,^^^ )/(g — 1) = e(x) /{q — 1). 
We now have 



/(x) > -^J Xi-Xi > ^— W /^""^ , 

^ - - q-1^ q{q-l) 

Combining both inequalities proves the lemma. □ 
We now state our main lemma of the proof: 

Lemma 31: Let x(^\...,x(^) be a set of probability vectors. Then 

K \ K 



i-f(Q^A > n(i-/(x('^))+o( E /(x("^))/(x(")^ 

\fc=l / k=l \m,n=l,...,K mj^n 



where O denotes GF(q) convolution, defined in (IIH and used in (I13t . 
Proof: We begin by examining the case of K = 2. 

We denote x^^) and x^^) by a and b. To simplify our analysis, we assume that oq = max(ao, ...,ag_i). We may 
assume this, because otherwise we can apply a shift by — imax to move the maximum to zero. This operation does 
not affect /(a). It is easy to verify that a"*""" 0b = (a0b)~*"'"= and hence the operation does not affect /(aQb) 
either. Similarly, we assume bo = max(6o, ■■■,bq-i). 

By the definition of /(•), we have 

/(a0b) = -4TE\/(a0b),-(a0b)j (65) 
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We now examine elements of the sum. We first examine the case that i = and j ^ 0. 



(a0b)o • (a0b)j = {aobo + ^ akb-t) ■ {ajbo + aobj + ^ akbj-k) 



[oo6o + 0(e(a)e(b))] • [0^60 + aobj + 0(e(a)e(b))] 
'aobo ■ ttjbo + oo^o • aobj + 0(e(a)e(b)) 
< i/aoflj + bobj + 0(e(a)e(b)) 



< ,/E^+ ^bobj + 0{sje{a)^£{h)) 

The result for the case of i 7^ and j = is similarly obtained. We now assume i,jy^O (the element i = j = 
does not participate in the sum). 



a0b)i-(a0b)j = {aibo + aobi + ^ akh-k) ■ {ajbo + aobj + ^ akbj^k) 



< ^J[a^ + b^ + 0(e(a)e(b))] • [aj + bj + 0(e(a)e(b))] 

< J aittj + b^bj + 0{e{a.)e{h)) 



< ^/a^+ + 0(^e(a)^e(b)) 
Inserting the above into (l65l we obtain 

/(a b) < -±-^Y.(v^+^J+0{^)^)))=f{s^)+f{h)+0{^)^)) 

= /(a) + /(b)+0(/(a)/(b)) 

The last equality having been obtained from Lemma |30l Finally, from the above we easily obtain the desired result 
of 

l-/(a0b) > (l-/(a)).(l-/(b)) + 0(/(a)/(b)) 
For the case of K > 2 we begin by observing that 

The remainder of the proof is obtained by induction, using Lemma |29l □ 
We now use the above lemma to obtain the following results 
Lemma 32: D{Ylt+i) satisfies, 

D(Rt+i) < A • A (1 - p(l - D{B.t)) + 0{D{B.tf)) (66) 
Proof: Consider R^. Since is obtained from it by applying a random permutation xg~^, we obtain, using 
Lemmal28land the fact that /(x) is invariant under a permutation on x, that Z?(Rt) = £'/(Rt) = EfCRt) = £'(R(). 
Thus we may instead examine R^. Similarly, we examine instead of L^. 
Assume the right-degree at a check-node is d. By il3l we have, 

i-D(Lt+,) = 1-DiQm) 

k=l 
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where {R^'^-'lf^i are i.i.d. and distributed as Rj. In the following, we make use of Lemma IbTI 



d-l 



1 - D(L 



t+ij 



i-Ef{Qm] 



k=l 

> E n(l-/(RW)) + 0(5] /(r('"))/(rW 

k=l m^n 

n (1 - i?/(R('=))) +0{Y. i?/(R(™))£;/(R("))) 

k=l m^n 
d-l 

n (1 - D{^^^^)) + 0{Y. D(R('"))D(r("))) 



k=l 



(1 - DiKt))''-' + Op(R, 



Averaging over all possible values of d, we obtain, 



l-D(Lt+i) > ^prf- [(l-D(Rt))'^-i + 0(D(Rj 
d 

= ^p,-(l-Z)(Ri))'^-i + 0(Z)(Rt)' 
= p{l - D(Rt)) + OiDiRt)^) 



(67) 



We now turn to examine D(Ilt^i). Assume the variable-node degree at which R^ is produced is deg. Applying 
(l59t and Q we have 



9-1 



D(Rt+i) 



-Rt+1,0 



1 



^(0) des-l j^{n) 



o(0) n (n) 
=1 \ Uq n=l ^0 

where {'L^'"'^'^^^^ are i.i.d. and distributed as ht+i- By Theorem|4j {L^")}^'^'/ are permutation-invariant, and thus, 
by Lemma 1211 (Appendix lIV-El . ai^e distributed identically with their random-permutations {L^^^j^'Z"/. Thus we 
obtain 

q-l 



1 ^ 



R 



(0) deg-l 



n 



1 \ Uq n=l \ ^0 



L 



(n) 



(n) 



Applying (I60t and reordering the elements, we obtain 



I?(Rt+i) = E 



1 



9-1 



(0)\ deg-l 



n ^(L^"^: 



= A • Z)(Lj+i)'^^3-i 

The second equality was obtained from (l59t . The last equality is obtained from (1611 1. Averaging over all values of 
deg, we obtain. 



Z)(Rt+i) = A-A(I)(Lt+i)) 



(68) 
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The function A(x) is by definition a polynomial with non-negative coefficients. It is thus nondecreasing in the range 
< X < 1. Using ^ and ^ we obtain □ 
The following lemma examines convergence to zero of D^Rt). 

Lemma 33: If A'(0)p'(l) < 1/A then there exists a > such that if D(Ilt„) < a at some iteration tQ, then 
limt^oo D(Rt) = 0. 

Proof: Using the Taylor expansion of the function p{l — x) around x = 

p{l-D(Rt)) = pil)-p'il)-D(Rt)+OiDCRt)^) 
= l-p'{l)-D{Kt) + 0{D(Rtf) 

where the equality p{l) = 1 is obtained by the definition of the function p{x). Plugging the above into (l66t we 
obtain, 

D{Rt+i) < A • A (/^'(l) • D{Kt) + 0{D{Kt f) 
Using the Taylor expansion of A(x) around x = 0, we obtain 



D(Rt+i) < A- 



A(0) + A'(0) • (p'(l) • D{Kt) + 0{D{Ktf)) + 0((p'(l) • D{Kt) + 0(Z)(Rt)2))') 



= A-X'{0)p'{l)-D{Tit) + O{D(Rt)^) 

Since A • X'{0)p'{l) < 1, there exists a such that if -D(Rto) < a then 

D(Rt,+i) < K ■ D(RtJ < L>(RtJ < a 

where if is a positive constant smaller than 1. By induction, this holds for all t > to- We have -D(Rt) > by 
definition, and therefore the sequence {D(J\.t)}'^t^ converges to zero. □ 
Finally, the following lemma links the operator D{-) with our desired Pe{-), defined as in \22\ . 
Lemma 34: Let X be a symmetric probability-vector random-variable. Then 

l/q^ ■ D{Xf < Pe(X) < (g - 1) • D(X) 
Proof: We begin by showing that Pe(X) = Ee{X). 

Pe(X) = X^Pe(x)Pr[X = x] 

= E I 4t E ^e(x+^) Pr[X = I X G X*] I Pr[X G x*] 

The last result was obtained in the same way as ( I63t and (I64t . The outer sum is over all sets x*. Let ix,...,?^ 
denote the indices that achieve max(xo, ...,Xg_i). Then Pe(x+*) = {m — l)/m if i = ii, ...,im and 1 otherwise. 
Using this and the symmetry of X, we obtain 

Pe(X) = ^(J- J2 • • n(x+*) + l-x.-n(x+*) I Pr[XGx*] 

X \ ^ ' l=tl,...,tm l^ti,...,t^ J 
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By Lemma fT3] (Appendix Hli. n(x+*) = n(x). We thus continue our development, 

Pe(X) = E(E1-^^-^ E Xmax)Pr[XGX*] 
X* yi=0 i=ji,...,im I 

= E(l-^max)Pr[XGX*] 

X* 

= E^WP^[Xex*] 

X* 

The result Pe(X) = £'e(X) is obtained from the fact that e{-) is constant over all vectors in x*. 
We now have, using Lemmas |28] and |30| and the Jensen inequality 



D{X) = Ef{X) < qE^e{X) < Ee{X) = gV^e(X) 
This proves ■ I?(X)^ < Pe(X). For the second inequality, we observe 

Pe(X) < Fr[3i / : X, > Xo] < EPr[X, > Xo] = E Pr[i/^ > 1] < E^l/^)/! 

i=l i=l V ^0 V ^0 

The last inequality is obtained by Markov's inequality. Combining the above with (l59l we obtain our desired result 

of Pe(X) < ((7-l)-Z?(X). □ 

Finally, consider the value a of Lemma l33l Setting t, = o? jf^ we have from Lemma l34l that if Pe(R(Q) < ^ then 
L'(Rtn) < a and thus -D(Rt) converges to zero. Applying Lemma l34l again, this implies that Pe(Rt) converges to 
zero, and thus completes the proof of Part|2lof the theorem. □ 

Appendix VII 
Proof of Theorem |6] 

We begin by observing that since W is Gaussian, W is symmetric if and only if for all i = 1, g — 1 and 
arbitrary LLR vector w, 

2wi = 21og— f^— ^ = 21og . ' 



= (w+*-m)^S-i(w+*-m)-(w-m)^S-i(w-m) (69) 

We first assume that W is symmetric and permutation-invariant and prove (125 1 . Since W is permutation-invariant, 
by LemmalHlwe have = EWi = EWj = mj for all i,j = 1, q—l. We therefore denote m=mi = ... = m,g_i. 

We begin by proving that m 7^ 0. We prove this by contradiction, and hence we first assume m = 0. Consider the 
marginal distribution of Wj for i = 1, g — 1, which must also be Gaussian. Since rrij = 0, the pdf of Wi satisfies 
fi{w) = fi{—w). By Lemma|9l Wi is symmetric in the binary sense. Hence fi{w) = fi{—w). Combining both 
equations yields fi{w) = for all w ^ ^. Hence Wi is deterministic, with zero variance, for all i. This leads to 
S = 0, which contradicts the theorem's condition that S is nonsingular. 
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We now show that conditions (l69l i = 1, ...,q — l uniquely define S. Since S is symmetric, so is S^^. Assume A 
and B are two symmetric matrices such that i69l is satisfied, substituting with A and with respectively. We 
now show that A = B. Let D = A — B. Subtracting the equation for B from that of A we obtain, for i = l,...,q — l, 



= (w+* - m)^ D(w+* - m) - (w - m)^ D{w - m 



(70) 



For convenience, we let Li denote the matrix corresponding to the linear transformation LiW = w'^^. Differentiat- 
ing dTOb twice with respect to w, we obtain that LfDLi = D. ^ may now be rewritten as 



m 



(w - m)'^LfDLi{w - m) 



Let X = w"*"*. Observe that x, like w, is arbitrary. Simple algebraic mainpulations lead us to 



2x'^Z)(m+* - m) 



m+')^Dm+' - Dm 



m+'fDm+' - LjDLim = 



TtT, 



Letting x = D(m+* — m) we obtain that ||D(m+* — m)p = where || • || denotes Euclidean norm. Thus 
L'(m+' — m) = 0. Consider the vectors {m+' — m}?^^^. We wish to show that these vectors are linearly independent. 
From we have (m+* — m)^ = mj+fc — rrii — rrik- Recall from Section Ull that i + k is evaluated over GF(q) and 
that mo = 0. From our previous discussion, rrii = m for all i = 1, q — I. Therefore, for all i ^ 0, k 0. 

( —m k ^ —i 
mp -mk = < 

{ —2m k = —i 

We now put the vectors {m+* — m}1z[ in a matrix M such that j = (m~' — m}fc. The matrix M is now given 

by, 



M 



-2m —m 
-m —2m 

-m 



-m 



Let the matrix V be defined by, 



V 



1 

m 



—2m 



That is, Vij = {l/q — 6[i — j])/m. It is easy to verify that V is the inverse of M. Hence M is nonsingular, and 
its columns, the vectors {m+* — m}^~^^, are thus linearly independent. We now have q — I linearly-independent 
vectors that satisfy L>(m+* — m) = 0. Hence D = 0, and we obtain that A = B as desired. 
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Consider the matrix M. If we could show that S = —M, we would obtain M5\ for a = \/2m (m > would be 
implied by Si i = 2m). For this purpose, we show that the choice S^^ = (— M)^^ = —V satisfies (l69l . 

(w+* - m)^(-y)(w+^ - m) - (w - m)^(-y)(w - m) = 

= (w^* — m + w — m)"^(— y)(w'*'* — m — w + m) 

= (w+* + w - 2m)^(-y)(w+* - w) 
g-i -I 
= + Wk- 2m)iwp - w,)-[d[k - j] - 1/q] 

k,j=i 

= - T.H' + 2m){wp - Wk) - — + 2m) ■ Y,{wf - Wj) (71) 

m am ^ ■' 

k=l ^ k=l j=l 

We now treat each of the above sums separately 

g-i 

Y.i'^P + ^fc - 2m) {wp -Wk) = 

k=l 

= E [H'f - K)' - • (wp - wu) 

k=l 

q^l (?— 1 (?— 1 g— 1 g— 1 g— 1 g— 1 

= E "^fc+i + E "^i ~ E ~ E "'fe ~ 2m ^ Wk+i + 2m ^ tfj + 2m ^ Wfe (72) 

fc=l fc=l fc=l A:=l A:=l fc=l fc=l 

The set of indices {k + i : k = I, ...,q — 1} = {0, q — l}\{i}. Recalling wq = 0, we have: 

g-l g-1 

k=l k=l 
g-l g-l 

^ Wk+i = ^Wk-Wi (73) 
fc=l k=l 

(17^ now becomes 

g-l 

Y^iwp + Wk- 2m) (wp -Wk) = 

k=l 

^g-1 \ /g-l \ g-l /g-l 

E ^fc ~ + (9 - 1)^2^ - 2u; J ^ - - ^ w| - 2m I ^ - ui^ ) + 
^fc=l / \fe=l / k=l \k=l 

g-l 

+2m{q — l)wi + 2m ^ Wk 
k=l 

g-l 

= q-wj - 2wi E ^fe + ^"T'? • (74) 
fc=i 

We now turn to the second sum of (fTTl . In a development similar to that of the first sum, we obtain 

g-l g-l 

Y^i^P + Wk - 2m) = 2j2wk - q ■ Wi - 2{q - l)m (75) 

k=l k=l 



Finally, the last sum of dTlT l becomes 



g-l 

Y,{wp - Wj) = -q ■ Wi (76) 
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Combining (|74|), ^ and ^ we obtain 

(w+* - m)^(-y)(w+^ - m) - (w - mf{-V){w - m) = 

Wi - 2{q - l)m I {-q ■ Wi) 



if \ I ( 

— \q-wf-2wi^Wk + 2mq ■ Wi \ 2 V ■w^ - 9 

V ^1 J qmy 



= ill) 

Thus = —V satisfies ( I69l l as desired. This completes the proof of 

We now assume (I25t and prove that W is symmetric and permutation-invariant. From d25l) it is clear that any 
reordering of the elements of W has no effect on its distribution, and thus W is permutation-invariant. To prove 
symmetry, we observe that the development ending with dTTl relies on (E51 alone, and thus remains valid. □ 

Appendix VIII 
Proofs for Section IviT] 

A. Proof of Lemma 1771 

By Lemma (TTl (Appendix IIII-At . 

PrfW = C = k] 

mW) = ^^Pr[C = fc]Pr[W = w|C = fc]log, ^ p^^^ ^ ^ 



k=0 w 



The second summation in the above equations is over all LLR vectors w with nonzero probability. 

By the lemma's condition, the tree assumption is satisfied. Thus by Theorem [0 the conditional distribution of 
W given C = is symmetric (recalling Lemma Appendix IIII-At . Using (I19t . we have 

rr^ ..r^ lv-^V-T^r^,r nil e""-^ Pr[W = W | C = 0] 

^ fc=o w g Ej=o e Pr[W = w I C = 0] 

q-l / q-l 

= = w+'= I C7 = 0] 1 - log, ^ e-("'^-'"'') 

^ fc=o w y j=o 

1 9-1 9-1 

= 1 - - ^ ^ Pr[W = w^'^ I C = 0] logg e-^""^-""'^ 

By Wj — Wk = W^-k- Since the third summation is over all j, we obtain by changing variables / = j — k 
(evaluated over GF(q)), 

/(C;W) = l--'^^Pr[W = w+^|C = 0]log/^e^"'''' 

^ k=0 w j'=0 

Changing variables in the second summation, w = w+'^', we obtain 

^ k=0 w i=0 

Since the sum over w is independent of k, we obtain, 

q-l 

I{C;W) = l-^Pr[W = w|C = 0]logq^e-"'^ 

w j=0 

(l26t now follows from the fact that = by definition (see Section UTJi. □ 
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B. The Pennutation-Invariance Assumption with EXIT Method 1 

In this section, we discuss a fine -point of the assumption of permutation-invariance used in the development of 
EXIT charts by Method 1 (Section fVII-CI) . Strictly speaking, the initial message R''-^^ and rightbound messages R'f 
are not permutation-invariant. However, we now show that we may shift our attention to R'^°^ and R't, defined as 
in Theorem 13 which are symmetric and permutation-invariant. 

We first show that /(C;R'^°)) and /(C;R't), evaluated using are equal to /(C;R'^°^) and /(C;R't) 

(respectively). It is straightforward to observe that the right-hand-side of (l26t is invariant to any fixed permutation 
of the elements of the random vector W. Thus, a random-permutation will also have no effect on its value. By 
the discussion in Appendix IIV-FI R'*^°'' and R'j are random-permutations of R''^^^ and R't, respectively. Thus, we 
have obtained our desired result. 

We proceed to show that the derivation of the approximation of Ie,vnd in Section fVII-CI is justified if we 
replace R''^^^ and R'^ with R''"°'* and R'^. By the discussion in Appendix IIV-FI R't may be obtained by replacing 
the instantiation r''^^-* of R''^^-' in dlSI l with an instantiation of R'^°^ Thus, R't is obtained from L't and R'^''^ 
using the same expressions through which R't is obtained from L't and and R'*^^'*. Therefore, the discussion of 
the derivation of the approximation for Ie,vnd (see Appendix IVII-CI l remains justified. 

By the discussion in Appendix IIV-FI the distribution of Lt is obtained from Rt using dTOt . and the distribution 
of Rt is not required for its computation. Finally, the approximation for Ie,cnd in Section IVII-Cl has been verified 
empirically, and therefore does not require any further justification. 

C. Gaussian Messages as Initial Messages of an AWGN Channel 

Let W be a Gaussian LLR-vector random variable defined as in Theorem |6l Let Pr[w | x] be the transition 
probabilities of the cyclic-symmetric channel defined by W (see Lemma |6l and Remark [fl Section fV-Cl l. We will 
now show that this channel is in effect a q — 1-dimensional AWGN channel. 

We begin by examining Pr[w | x = i]. 

Pr[w \x = i]= Pr[W = w+^] = Pr[W-* = w] 

Thus the channel output, conditioned on transmission of i, is distributed as W^'. The operation —i, as defined 
by is linear. Thus W^* is Gaussian with a mean of (m being defined by d25t ') and a covariance matrix 
which we will denote by S^^*). Let k,l = 1, ...,q — 1. 

where S is given by (l25l and we define, for convenience, Eqj = Sj,o = for all j = 0, q — I (also, recall from 
Section |nl that k — i and / — z are evaluated over GF{q)). Evaluating (l78l for all k,l = 1, g — 1, it is easily 
observed that = S. 

The above implies that the cyclic-symmetric channel defined by W is distributed as a g — 1 dimensional AWGN 
channel whose noise is distributed as Af{0, S) and whose input alphabet is given by 5{i) = m^*. Both the noise 
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and the input alphabet are functions of a. By definition, this channel is cyclic-symmetric and thus the LLR-vector 
initial messages of LDPC decoding satisfy r'^^^ = w where w is the channel output. 

In the sequel, we would like to consider channels whose input alphabet is independent of a. For this purpose, we 
consider a channel whose output y is obtained from w by y = (2/(T^) • w. The result is equivalent to an AWGN 
channel whose input alphabet is given by 6{i) = (2/(T^) • m^* = 1^* where 1 = [1, 1]^ and whose noise is 
distributed as Af{0,'Ez) where = (2/(7^)^S. Letting az = 2/a, we obtain that is defined as the matrix S 
of d25l) with a substituted by az- 

The multiplication by does not fool the initial messages of LDPC decoding, and thus r'^''^ = w = 

((T^/2) • y = (2/(j^) • y. We summarize these results in the following lemma. 

Lemma 35: Consider transmission over a. q — 1-dimensional AWGN channel, and assume zero-mean noise with 
a covariance matrix defined as the matrix S of d25b with a substituted by az- Assume the following mapping 
from the code alphabet 6{i) = « = 0, q — 1, where —i is defined using LLR representation and 1 is defined 
above. 

1) Let y denote the q — I dimensional channel output and r'^°^ denote the LLR-vector initial message. Then 
r'(°) = 2/CT2 . y. 

2) Let the random variable R'^*^^ denote the initial message, conditioned on the all-zero codeword assumption. 
Then R'^^-* is Gaussian distributed, and satisfies d25t with a = 2/az- 

D. Properties and Computation of J{-) 

We examine J(cr) in lines analogous to the development of ten Brink [36] for binary codes. In Appendix IVIII-CI 
we showed that a Gaussian W distributed as in Theorem |6l and characterized by a, may equivalently be obtained as 
the initial message, under the all-zero codeword assumption, of a g — 1 dimensional AWGN channel characterized 
by a parameter cr^ = 2/ a. The capacity of this channel is J(cr) = /(C; W). The parameter cr^ infers an ordering 
on the AWGN channels such that channels with a greater Gz are degraded with respect to channels with a lower 
Gz- Thus J((t) is monotonically increasing and J~^(-) is well-defined. As o" ^ cxd, cj^ approaches zero. Thus 

lim Jla) = 1 

Similarly, 

lim J(cj) = 

To compute J(-) and J^^(-), we need to evaluate d26b for a Gaussian random variable as defined in Theorem |6l 
Following [35], we evaluate (I26t for values of a along a fine grid in the range a G (0, 6.5) (6.5 being selected 
because J(6.5) ~ 1), and then applied a polynomial best-fit to obtain an approximation of J(-) and J^^(-) (note 
that this operation is performed once: the resulting polynomial approximations of J(-) and J^^(-) are the same for 
all codes). 

In [35] the equivalent J(-) was evaluated by numerically computing the one-dimensional integral by which the 
expectation is defined. In our case, the distribution of W is multidimensional, and is more difficult to evaluate. 
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We therefore evaluate the right-hand side of d26l empirically, by generating random samples of W according to 
Theorem |6l 

E. Computation of J R^a; az,S) 

The computation of JR{a; az,6) is performed in lines analogous to the computation of J {a) as described in 
Appendix IVIII-DI We compute JR{a; az, 6) for fixed values of az and 6 and for values of a along a fine grid in 
the range a G (0, ...,6.5). We then apply a polynomial best-fit to obtain an approximation of JR{a; az,6) for all 
a and an approximation of Jr^{I; o-z,6). 

To compute JR{a; (Jz,S) at a point of the above discussed grid, we evaluate the right-hand side of d26b (replacing 
W with a rightbound LLR-vector message R') empirically. Samples of R' are obtained by adding samples of initial 
messages to those of intermediate values. The samples of the initial messages are produced using Lemma fT2l (with 
the coset symbol G {0, g — 1} randomly selected with uniform probabiUty). The samples of the intermediate 
values, for a given a, are produced using Theorem |6l 

Note that unlike J(ct), which satisfies J(0) = 0, Jr{0; (Tz, 5) is greater than zero. This results from the fact that 
the distribution of the rightbound message R' corresponding to cr = is equal to the initial message R'(°), and 
/(C; R'^°^) > 0. Letting = I(C; R'^°^), we have that J^^(/; Uz.S) is not defined in the range I e [0,/(°)). 

F. Computation of I e,cnd{I A] j,crzi^) 

Our development begins in the lines of Appendices IVIII-P] and fVIII-EI We compute Ie,cnd{Ia] j,<^z,S) for 
fixed values of cj^ and 6 and for values of Ia along a fine grid. We then apply a polynomial best-fit to obtain an 
approximation of l£;(cj; j,az,S) for all a in this range. 

To compute Ie,cnd{Ia', j^f^z,^) at a point of the above discussed grid, we again evaluate the right-hand side 
of (I26t empirically. We begin by applying Jj^^{Ia', cTz, ^) to obtain the value of a which (together with cr^ and 6) 
characterizes the LLR-vector rightbound message distribution. We then produce samples of rightbound messages 
as described in Appendix IVIII-EI We also produce samples of labels g G GF(g)\{0} that are required to compute 
the leftbound samples F of L'. The label samples are generated by uniform random selection. We use the samples 
P of L' to empirically evaluate the right-hand side of d26b (replacing W with L') and obtain Ie,cnd{Ia', j, <^z,S)- 
Note that computing i26l with L' instead of L' had no effect on the final result. 

Finally, Ie,cnd{Ia', j,crz,S) as defined in Section fVII-EI like Jr^{I; o'z,^) (discussed in Appendix IVIII-H . is 
not defined for / G [0,/^'^^). This interval is not used in the EXIT chart analysis of Section Ivn-EI 
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